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1.  Summary 

This  report  describes  the  current  work  in  progress  for  the  SAGA  project.  The 

highlights  of  the  research  in  the  last  six  months  are: 

• Clemma,  an  automated  configuration  librarian,  is  undergoing  development. 
Clemma  will  provide  configuration  management  and  version  control  capabilities  for 
the  SAGA  system.  Clemma  is  being  implemented  using  the  Troll  database  and  the 
UNIX  file  system.  A prototype  of  Clemma  will  be  completed  in  the  Fall  of  1986. 

• GNU  Emacs  as  an  alternative  user  interface  for  the  Epos  editor. 

• A formal  foundation  for  the  stepwise  development  of  software  components  includ- 
ing a formal  model  for  the  stepwise  development  of  verified  programs  and  an  exam- 
ple of  a stepwise  development  method  which  falls  within  the  framework  of  the  for- 
mal model. 

• A survey  of  software  management  techniques  in  AT&T. 

• A design  for  a project  management  utility  for  SAGA. 

• An  implementation  of  the  Cocomo  cost  model  in  a software  package. 

• A prototype  implementation  of  ENCOMPASS  written  in  a combination  of  C,  Csh, 
Prolog  and  Ada. 

• Simple  implementations  of  the  project  management  and  configuration  control  sys- 
tems in  the  ENCOMPASS  prototype  supporting  "programming  in  the  small". 

• An  initial  version  of  ISLET,  the  language-oriented  editor  used  to  create  PLEASE 
specifications  and  refine  them  into  Ada  implementations. 

• An  initial  version  of  the  software  which  automatically  translates  PLEASE 
specifications  into  Prolog  procedures  and  generates  the  support  code  necessary  to 
call  these  procedures  from  Ada. 

• The  run-time  support  routines  and  axiom  sets  for  a number  of  pre-defined  types  in 
ENCOMPASS. 

• Interfaces  to  the  ENCOMPASS  test  harness  and  TED. 

• PLEASE  features  to  support  if,  while,  and  assignment  statements,  as  well  as  pro- 
cedure calls  with  in,  out  or  in  out  parameters. 

• PLEASE  features  to  support  a small,  fixed  set  of  types  including  natural  numbers, 
lists,  booleans  and  characters. 

• PLEASE  and  ENCOMPASS  use  to  develop  small  programs,  including  specification, 
prototyping,  and  mechanical  verification. 

Appendix  A contains  a list  of  twenty  theses  and  papers  that  document  the  project. 

Six  of  these  were  produced  since  the  last  mid-year  report.  Appendices  B through  P con- 
tain reports,  thesis  proposals,  papers,  and  other  work  produced  as  part  of  the  NASA 

project. 
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2.  Overview 

Large  scale  software  development  is  so  expensive  that  new  techniques  and  methods 
are  required  to  improve  productivity.  The  software  development  environment  is  a pro- 
posed solution  in  which  software  development  methods  and  paradigms  are  embedded 
within  a computer  software  system.  The  goal  of  an  environment  is  to  provide  software 
developers  with  a computer-aided  specification,  design,  coding,  testing  and  maintenance 
system  that  operates  at  the  level  of  abstraction  of  the  software  development  process  and 
the  application  domains  of  its  intended  products. 

Proposed  software  development  environments  range  from  simple  collections  of  software 
tools  that  enhance  the  development  process  to  complex  systems  that  support  sophisti- 
cated software  production  methods.  Every  environment  must  include  a representation 
for  the  eventual  software  products  and  a,  perhaps  informal,  notion  of  the  software 
development  process.  In  the  SAGA  project,  we  have  been  investigating  the  principles 
and  practices  underlying  the  construction  of  a software  development  environment.  In 
this  report,  we  review  our  studies  and  results  and  discuss  the  issues  of  providing  practi- 
cal environments  in  the  short  and  long  term. 

Research  into  software  development  is  required  to  reduce  the  cost  of  producing 
software  and  to  improve  software  quality.  Modern  software  systems,  such  as  the  embed- 
ded software  required  for  NASA’s  space  station  initiative,  stretch  current  software 
engineering  techniques.  The  requirements  to  build  large,  reliable,  and  maintainable 
software  systems  increases  with  time.  Much  theoretical  and  practical  research  is  in  pro- 
gress to  improve  software  engineering  techniques.  One  such  technique  is  to  build  a 
software  system  or  environment  which  directly  supports  the  software  engineering  pro- 
cess. In  this  report,  we  will  describe  research  in  the  SAGA  project  to  design  and  build  a 
software  development  environment  which  automates  the  software  engineering  process. 

The  design  of  a computer-aided  software  development  environment  should  be 
guided  by  the  problems  that  arise  in  manual  software  development  methods.  Many  of 
these  problems  are  reflected  in  software  cost  estimation  models  and  measurements. 
Software  costs  are  very  sensitive  to  mistakes  in  the  early  requirements  and  design  phases 
of  development.  Programmers  and  program  testers  vary  greatly  in  the  productivity  and 
quality  of  their  work.  However,  high-level  languages  and  software  tools  to  support 
development  may  increase  the  productivity  of  a programmer.  Orders  of  magnitude 
improvement  in  the  productivity  of  software  engineers  might  be  achieved  in  many  appli- 
cation areas  if  the  products  of  software  engineering  can  become  reusable,  that  is,  if  the 
requirements,  design,  documentation,  validation,  and  verification  of  a software  system 
can  be  reused  in  maintenance  and  in  building  new  systems. 

The  SAGA  project  is  investigating  the  design  and  construction  of  practical  software 
engineering  environments  for  developing  and  maintaining  aerospace  systems  and  applica- 
tions software.  The  research  includes  the  practical  organization  of  the  software  lifecycle, 
configuration  management,  software  requirements  specification,  executable  specifications, 
design  methodologies,  programming,  verification,  validation  and  testing,  version  control, 
maintenance,  the  reuse  of  software,  software  libraries,  documentation  and  automated 
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management.  An  overview  of  the  SAGA  project  components  is  described  in  Appendices 
C and  D. 

In  several  of  the  papers  we  have  produced,  we  argue  for  research  into  formal  models 
of  the  software  development  process  (Appendices  D,  F,  G,  and  H.)  Such  formal  models 
should  aid  experimental  evaluation  of  the  practical  techniques  that  are  used  in  the  con- 
struction of  software  development  environments. 

The  SAGA  project  is  developing  models  of  configuration,  design,  incremental 
development,  and  management.  The  concepts  and  tools  resulting  from  SAGA  are  being 
used  to  develop  a prototype  software  development  system  called  ENCOMPASS  (Appen- 
dices I and  B2).  Although  the  research  has  developed  many  general  tools  and  concepts 
that  are  independent  of  the  application  language  and  domain,  we  hope  to  extend 
ENCOMPASS  to  support  the  development  of  large,  embedded  software  systems  written 
mainly  in  ADA. 

In  the  remainder  of  this  report,  we  describe  in  more  detail  the  work  accomplished 
this  year. 

3.  Encompass 

An  initial  prototype  of  the  ENCOMPASS  environment  has  been  constructed  on  a 
Sun  workstation  running  Unix3.  The  system  uses  the  Verdix  Ada4  Development  System 
as  well  as  many  tools  developed  by  the  SAGA  project.  The  prototype  contains  simple 
facilities  for  configuration  control  and  project  management  and  has  a uniform,  object- 
oriented  user  interface.  From  ENCOMPASS,  the  user  can  invoke  IDEAL  (Incremental 
Development  Environment  for  Annotated  Languages)  which  provides  facilities  for  speci- 
fying, prototyping,  testing  and  implementing  Ada  programs. 

IDEAL  implements  a development  methodology'  similar  to  VDM.  Procedures  are 
first  specified  using  pre-  and  post-conditions  written  in  a subset  of  first  order  predicate 
logic.  These  specification  can  be  automatically  transformed  into  prototypes  written  in  a 
combination  of  Ada  and  Prolog.  ENCOMPASS  provides  tools  that  support  the  creation 
of  acceptance  tests  using  these  prototypes.  To  create  and  refine  specifications,  the  pro- 
grammer uses  ISLET  (Incredibly  Simple  Language-oriented  Editing  Tool)  an  incremen- 
tal, language-oriented  editor  specifically  for  incremental  refinement  of  the  PLEASE 
language. 

Using  ISLET,  the  PLEASE  specification  is  incrementally  refined  into  an  Ada  pro- 
gram. This  process  is  viewed  as  the  construction  of  a proof  in  the  Hoare  Calculus.  Each 
refinement  is  verified  before  another  is  applied;  therefore,  the  final  program  satisfies  the 
original  specification.  Verification  conditions  are  generated  from  each  refinement  step. 
ISLET  can  certify  many  VCs  using  algebraic  simplifications  and  simple  proof  pro- 
cedures. If  these  measures  fail,  ISLET  invokes  TED  as  an  interface  to  a general  purpose 

2 B contains  an  early  description  of  our  work. 

3 Unix  is  a trademark  of  AT&T 

4 Ada  is  a trademark  of  the  United  States  government. 
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theorem  p rover. 

Appendices  B and  I report  more  fully  on  PLEASE  and  ENCOMPASS.  Appendix  I 
contains  Bob  Terwilliger’s  Ph.D.  Preliminary  proposal  and  two  supporting  papers.  The 
PLEASE  paper  in  Appendix  B has  been  presented  at  a conference.  Appendix  J contains 
a thesis  by  Phillip  Roberts  on  the  translation  of  predicates  to  Prolog. 

4.  Configuration  Control 

A prototype  configuration  librarian,  Clemma,  is  currently  under  development. 
The  goal  of  the  system  is  to  provide  a means  of  organizing,  indexing  and  storing  the 
on-line  components  of  software  projects.  Users  will  be  able  to  store  both  individual  files 
and  hierarchies  of  files  as  configuration  items  in  the  library.  An  overview  of  some  of  the 
issues  involved  in  configuration  management  and  a description  of  a small  Saga  prototype 
can  be  found  in  the  ENCOMPASS  paper  in  Appendix  I. 

Because  (as  Nestor  pointed  out  in  a recent  CMU  technical  report)  there  are  many 
deficiencies  with  using  just  a file  system  or  data  base  to  represent  components  of  a 
software  development,  we  have  adopted  a combined  approach  in  which  both  a data  base 
and  a file  system  are  used.  The  deficiencies  of  traditional  data  bases  and  file  systems  for 
representing  components  of  software  development  has  been  known  for  some  time  and 
several  projects  are  attempting  to  implement  persistent  object  storage  (a  French  Esprit 
project  is  already  implementing  such  a data  base  under  Unix).  It  is  unclear,  as  of  this 
moment,  whether  these  attempts  will  be  successful. 

Our  approach  of  combining  data  bases  with  file  systems  has  the  advantage  that  it 
does  permit  the  rapid  prototyping  of  many  of  the  facilities  which  are  needed.  It  also 
obviates  the  need  to  construct  a complex  piece  of  software,  at  least  until  the  perfor- 
mance characteristics  of  persistent  object  storage  are  better  understood. 

Clemma  will  provide  several  capabilities: 

• Baselines  of  software  modules  can  be  recorded  and  updates  can  be  tracked  and  used 
to  form  new  baselines. 

• Stored  modules  can  be  checked  out  for  re-use,  with  access  lists  provided  to  handle 
problems  of  permission  and  change  control. 

• A browser  will  be  incorporated  so  that  users  may  more  easily  find  useful  modules  in 
the  library.  This  should  greatly  promote  software  re-use. 

• “Views”  of  modules  will  be  implemented  as  hierarchical  groupings  of  stored 
configuration  items.  This  will  greatly  aid  testing,  validation  and  re-use  of  software 
systems. 

• By  placing  constraints  on  the  state  of  items  checked  into  the  library  (whether  an 
item  is  fully  documented,  tested,  etc.)  one  will  be  able  to  implement  a development 
methodology  for  the  software,  and  control  the  construction  and  use  of  individual 
components. 

The  system  will  be  written  primarily  in  the  C programming  language,  and  will  use 
the  Troll  DBMS  and  Unix  ™;  file  system  for  support.  The  current  prototype  of  Clemma 
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is  expected  to  be  completed  in  the  Fall  of  1986. 

Appendix  M contains  an  early  draft  of  Clemma’s  design,  a more  detailed  document 
is  being  prepared.  As  of  September,  major  parts  of  Clemma  have  been  programmed. 

5.  The  Epos  Editor 

Peter  Kirslis  completed  the  major  parts  of  the  Epos  editor  and  finished  his  Ph.D. 
which  is  included  as  Appendix  E.  He  is  continuing  development  of  a SAGA-based  editor 
in  his  current  employment  at  AT&T  in  Denver.  His  new  editor  will  be  based  on  Lex  and 
Yacc  and  an  internal  AT&T  editor  interface.  George  Beshers  regular-right  part  gram- 
mar based  Olorin  editor  generator  system  is  near  completion.  George  is  currently  revis- 
ing his  Ph.D.  thesis  having  passed  the  oral  examination. 

The  prototype  user  interface  to  the  Epos  editor  became  the  major  obstacle  to 
deploying  Epos  for  practical  software  development.  In  order  to  facilitate  the  integration 
of  several  Saga  utilities,  we  decided  to  adopt  the  GNU  Emacs  extensible  editor  as  the 
front  end  user  interface.  The  EPOS  incremental  parser,  the  incremental  semantics  pro- 
cessor, and  other  Saga  utilities  may  now  be  added  to  the  GNU  Emacs  environment  as 
background  processes  which  will  communicate  with  each  other  through  Emacs.  Each 
pair  of  communicating  processes  requires  an  interface  which  is  programmed  in  the  GNU 
Emacs  extension  language,  ELisp. 

The  interface  between  GNU  Emacs  and  the  incremental  parser  has  been  completed. 
GNU  Emacs  itself  was  changed  to  pass  all  text  changes  to  the  interface.  The  interface 
collects  these  changes  within  local  regions,  and  eventually  passes  them  on  to  the  incre- 
mental parser.  Parsing  errors  are  signalled  with  an  error  message  and  the  unparsed  text 
is  highlighted.  Highlighting  required  another,  more  difficult  change  to  GNU  Emacs. 
User  commands  which  need  to  look  at  the  parse  tree,  such  as  token  movement  or  tree 
selection,  ask  the  parser  to  return  the  appropriate  information. 

A number  of  modifications  were  made  to  the  Epos  incremental  parser  to  allow  it  to 
be  used  with  the  Emacs  front-end.  The  primary  task  was  to  extract  the  parser  from  the 
Epos  editor  and  to  develop  an  interface  of  primitive  commands  to  be  used  by  Emacs. 

The  parse  tree  representation  was  upgraded  to  allow  arbitrary  text  to  be  stored  in 
the  tree  (including  tabs  and  trailing  blanks).  Standard  Pascal  multi-line  comments  are 
now  supported,  although  a change  of  the  termination  of  a comment  is  not  yet  properly 
reparsed.  Also  added  was  a module  to  allow  selection  and  modification  of  a range  of  the 
parse  tree  for  use  by  the  editor.  A number  of  previously-existing  bugs  in  the  parser 
were  revealed  and  fixed  while  developing  this  new  interface.  Appendix  L contains  a 
description  of  the  new  GNU  EMACS-based  Epos. 

6.  Software  Engineering  Management 

We  wish  to  automate  much  of  the  control,  communications,  and  tracking  that  is 
associated  with  the  products  involved  during  the  lifetime  of  a software  system.  To  date, 
we  have  been  looking  at  various  global  pictures  of  the  software  lifetime  to  determine 
what  management  structures  are  used  and  what  they  require  to  be  used  effectively.  We 
would  like  the  management  tool  to  support  most  management  structures  of  workers 
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(including  managers)  and  documents  (including  program  and  management). 

Appendix  0 contains  a summary  of  management  techniques  used  in  AT&T  Middle- 
town  to  support  the  software  for  System  75,  the  digital  telephone  exchange.  The  sum- 
mary was  collected  by  Bob  Sum  on  a visit  to  AT&T.  The  summary  is  being  correlated 
with  the  various  NASA  proposed  lifecycle  tasks.  We  have  also  being  studying  other  pro- 
posed project  management  systems.  As  part  of  these  studies,  Professor  Campbell 
attended  the  Lancaster  Software  Environments  conference,  Trondheim  Software 
Engineering  conference,  and  RADC  KPSA  meeting.  The  most  advanced  of  project 
management  systems  appear  to  be  that  of  the  Carnegie  Group  Inc.,  the  Kestrel  Institute, 
Boeing,  and  TRW.  It  is  clear  from  these  studies  that  there  still  remains  much  to  be 
done  to  integrate  project  management  with  the  other  activities  in  software  development 
and  that  most  systems  remain  primitive  or  are  prototypes. 

In  Appendix  C,  Campbell  and  Terwilliger  discuss  the  notion  of  tasks  being  passed 
between  the  m trays  and  out  trays  of  software  developers.  That  paper  begins  to  address 
the  problem  of  interrelating  project  management  with  configuration  control  and  other 
SAGA  tools.  Project  management  and  configuration  control  interaction  have  also  been 
prototyped  as  part  of  ENCOMPASS  and  a description  of  this  work  can  be  found  in  the 
ENCOMPASS  paper  in  Appendix  I.  In  particular,  the  need  for  a finer  granularity  of 
milestone  is  discussed.  Further  extension  of  these  ideas  that  should  form  part  of  an 
eventual  management  tool  may  be  found  in  Appendix  N. 

Work  is  now  progressing  on  developing  an  implementation  of  these  ideas.  This 
work  will  build  upon  Clemma  and  earlier  designs  for  the  project  management  system. 

7.  A Model  for  Stepwise  Development  of  Programs 

The  task  of  specifying  and  designing  a software  a software  component  and  verifying 
that  the  component  satisfies  a given  specification  is  quite  difficult.  An  approach  which 
makes  this  task  more  manageable  is  to  divide  the  development  of  a software  component 
into  a series  of  steps.  At  each  step  the  following  occur: 

(1)  The  software  component  is  specified.  At  each  step  after  the  first,  the  specification 
is  an  augmentation  of  the  specification  at  the  preceding  step. 

(2)  Design  decisions  which  are  consistent  with  design  decisions  at  preceding  steps  are 
made. 

(3)  It  is  determined  that  the  (possibly  incomplete)  software  component  satisfies  its 
specification. 

The  Vienna  Development  Method  (VDM)  [Jones,  80]  is  an  example  of  such  a stepwise 
development  method. 

In  order  to  study  the  properties  of  a particular  stepwise  development  method  or  to 
compare  different  stepwise  development  methods,  it  would  be  advantageous  to  have  a 
formal  model  for  the  stepwise  development  process.  In  addition,  any  attempt  to  auto- 
mate this  process  would  benefit  from  formalizing  the  notions  involved.  A formal  model 
has  been  constructed  and  is  described  in  some  detail  in  Appendix  H.  More  concise  state- 
ments of  the  model  will  be  found  in  Appendices  F and  G.  It  is  conceptually  simple  and 
independent  of  both  the  specification  'method  used  and  the  method  used  for  determining 
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that  a software  component  satisfies  its  specification.  It  contains  formal  definitions  for 
such  basic  ideas  as  a development,  a correct  development,  a development  step,  and  a 
correct  development  step. 

The  model  has  been  used  in  the  study  of  an  example  of  a stepwise  development 
method.  The  example  is  a method  for  the  stepwise  development  of  programs  which  are 
verified  to  be  partially  correct  with  respect  to  specifications.  The  specifications  are 
expressed  in  terms  of  pre-  and  post-conditions.  The  model  has  been  most  helpful  in  the 
construction  of  the  example.  In  one  case,  the  requirements  of  the  model  were  met  in  the 
example  because  of  the  soundness  and  relative  completeness  of  the  Hoare  calculus.  If  the 
example  is  viewed  apart  from  the  model,  it  is  not  obvious  that  these  properties  of  the 
Hoare  calculus  are  needed.  The  model  was  also  useful  in  modifying  the  Hoare  calculus, 
which  is  a method  for  program  verification,  into  a stepwise  verification  method  for 
software  components. 

A description  of  the  formal  model  and  results  concerning  the  properties  of  the 
model  have  been  obtained.  An  example  of  a stepwise  development  method  based  upon 
the  Hoare  logic  and  calculus  has  been  studied  in  detail.  It  has  been  proved  that  this 
development  method  has  the  properties  of  the  formal  model.  The  details  of  this  model, 
the  results,  and  examples  are  given  in  the  Appendices. 


8.  Comparison  Tools  and  Software  Environments 

Carol  Beckman  has  continued  her  studies  into  the  uses  of  differences  in  software 
development.  Her  Ph.D.  preliminary  thesis  proposal  surveys  differencing  techniques  and 
discusses  the  various  approaches  she  is  investigating  to  improve  the  use  of  these  methods 
in  software  development  environments  (see  Appendix  K.) 

9.  A COCOMO  cost  estimating  package 

As  part  of  a Software  Engineering  course  during  the  Spring  of  1986,  Professor 
Campbell’s  students  implemented  a cost  estimating  package  for  software  development 
based  on  Barry  Boehm’s  COCOMO  model.  Documentation  of  the  package  is  included  in 
Appendix  P. 
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Abstract 

PLEASE  is  an  executable  specification  language  which  supports  program  development  by  incre- 
mental refinement.  Software  components  are  first  specified  using  a combination  of  conventional 
programming  languages  and  mathematics.  These  abstract  components  are  then  incrementally 
refined  into  components  in  an  implementation  language.  Each  refinement  is  verified  before 
another  is  applied;  therefore,  the  final  components  produced  by  the  development  satisfy  the  origi- 
nal specifications.  PLEASE  allows  a procedure  or  function  to  be  specified  using  pre-  and  post- 
conditions written  in  predicate  logic  and  an  abstract  data  type  to  have  a type  invariant. 
PLEASE  specifications  may  be  used  in  proofs  of  correctness,  and  may  also  be  transformed  into 
prototypes  which  use  Prolog  to  “execute”  pre-  and  post-conditions.  The  early  production  of  exe- 
cutable prototypes  for  experimentation  and  evaluation  may  enhance  the  development  process. 

1.  Introduction 

It  is  widely  acknowledged  that  producing  correct  software  is  both  difficult  and  expensive.  To  help 
remedy  this  situation,  methods  of  specifying[l3,19,20,26,29,3l]  and  verifying[l4,16,19,27,38]  software  have 
been  developed.  The  SAGA  (Software  Automation,  Generation  and  Administration)  project  is  investigat- 
ing both  the  formal  and  practical  aspects  of  providing  automated  support  for  the  full  range  of  software 
engineering  activities[2,6,8,15,23,35],  PLEASE  is  a language  being  developed  by  the  SAGA  group  to  sup- 
port the  specification,  prototyping,  and  rigorous  development  of  software  components.  In  this  paper  we 
describe  the  development  methodology  for  which  PLEASE  was  created,  give  an  example  of  development 
using  the  language,  and  describe  the  methods  used  to  prototype  PLEASE  specifications. 

A life-cycle  model  describes  the  sequence  of  distinct  stages  through  which  a software  product  passes 
during  its  lifetime[lO|.  There  is  no  single,  universally  accepted  model  of  the  software  life-cycle[3,40].  The 
lThis  research  is  supported  by  NASA  grant  NAG  1-138. 
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stages  of  the  life-cycle  generate  software  components,  such  as  code  written  in  programming  languages,  test 
data  or  results,  and  many  types  of  documentation.  In  many  models,  a specification  of  the  system  to  be 
built  is  created  early  in  the  life-cycle;  as  components  are  produced  they  are  verified[10]  for  correctness  with 
respect  to  this  specification.  The  specification  is  validated[  10]  when  it  is  shown  to  satisfy  the  customers 
requirements. 

Producing  a valid  specification  is  a difficult  task.  The  users  of  the  system  may  not  really  know  what 
they  want,  and  they  may  be  unable  to  communicate  their  desires  to  the  development  team.  If  the 
specification  is  in  a formal  notation  it  may  be  an  ineffective  medium  for  communication  with  the  custo- 
mers, but  natural  language  specifications  are  notoriously  ambiguous  and  incomplete.  Prototyping[l2, 24] 
and  the  use  of  executable  specification  languages[21,22,29,4l]  have  been  suggested  as  partial  solutions  to 
these  problems.  Providing  the  customers  with  prototypes  for  experimentation  and  evaluation  early  in  the 
development  process  may  increase  customer/developer  communication  and  enhance  the  validation  and 
design  processes. 

To  help  manage  the  complexity  of  software  design  and  development,  methodologies  which  combine 
standard  representations,  intellectual  disciplines,  and  well  defined  techniques  have  been  pro- 
posed[l7,19,37,39].  For  example,  it  has  been  suggested  that  top-down  development  can  help  control  the 
complexity  of  program  construction.  By  using  stepwise  refinement  to  create  a concrete  implementation 
from  an  abstract  specification  we  divide  the  decisions  necessary  into  smaller,  more  comprehensible  groups. 
Methods  to  support  the  top-down  development  of  programs  have  been  devised[l9,32]  and  put  into  use[34]. 
It  has  also  been  proposed  that  software  development  may  be  viewed  as  a sequence  of  transformations 
between  specifications  written  at  different  linguistic  levels[2b\\  systems  to  support  similar  development 
methodologies  have  been  constructed[30]. 

The  Vienna  Development  Method[l9,34]  supports  the  top-down  development  of  programs  specified 
in  a notation  suitable  for  mathematical  verification.  In  this  method,  programs  are  first  written  in  a 
language  combining  elements  from  conventional  programming  languages  and  mathematics.  A procedure 
or  function  may  be  specified  using  pre-  and  post-conditions  written  in  predicate  logic;  similarly,  an  invari- 
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ant  may  be  specified  for  a data  type.  Then  these  abstract  programs  are  incrementally  refined  into  pro- 
grams in  an  implementation  language.  The  refinements  are  performed  one  at  a time,  and  each  is  verified 
before  another  is  applied;  therefore,  the  final  program  produced  by  the  development  satisfies  the  original 
specification. 

Path  Pascal[7]  is  an  extension  to  standard  Pascal  allowing  concurrent  programming  and  encapsu- 
lated data  types.  In  Path  Pascal,  a process  is  a program  structure  which  has  an  independent  thread  of 
execution;  independently  executing  processes  communicate  through  shared  data  structures.  Encapsulated 
data  types  called  objects  are  manipulated  only  by  the  predefined  routines  associated  with  the  type.  Path 
expressions[ 4,5]  specify  synchronization  constraints  that  apply  to  the  execution  of  the  processes,  functions 
and  procedures  within  objects. 

PLEASE  is  an  extension  of  Path  Pascal,  which  supports  a methodology  similar  to  the  Vienna 
Development  Method.  In  PLEASE,  a procedure  or  function  may  be  specified  with  pre-  and  post- 
conditions written  in  predicate  logic,  and  similarly  an  object  may  be  specified  using  an  invariant.  For  ease 
of  expression,  several  data  types  have  been  added  to  the  language.  PLEASE  specifications  may  be  used  in 
proofs  of  correctness;  they  also  may  be  transformed  into  prototypes  which  use  Prolog [9]  to  “execute”  pre- 
and  post-conditions,  and  may  interact  with  other  modules  written  in  conventional  languages.  We  believe 
that  the  early  production  of  executable  prototypes  for  experimentation  and  evaluation  will  enhance  the 
software  development  process. 

In  section  two  of  this  paper,  we  describe  the  development  methodology  PLEASE  was  designed  to 
support,  and  in  section  three,  we  give  an  example  of  program  development  using  PLEASE.  First  we  dis- 
cuss an  example  program  specification  and  describe  how  an  executable  prototype  could  be  created  for  it. 
Then  we  show  a refinement  of  this  specification  and  discuss  the  process  of  verifying  that  the  refined 
specification  satisfies  the  original.  In  section  four,  we  give  an  example  of  data  type  specification  in 
PLEASE,  and  in  section  five,  we  discuss  the  implementation  of  the  system.  In  section  six,  we  describe  the 
work  we  have  planned  for  the  future  and  in  section  seven,  we  summarize  and  draw  some  conclusions  from 
our  experience. 
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2.  Incremental  Program  Development 

Figure  1 shows  a view  of  the  life-cycle  model  which  PLEASE  was  designed  to  support;  a different 
perspective  is  given  in[35].  In  our  model,  a customer  comes  to  a software  development  team  to  have  a sys- 
tem constructed.  In  the  requirements  definition  phase , the  functions  and  properties  of  the  software  to  be 
produced  by  the  development  are  determined[lO].  A systems  analyst  produces  a software  requirement 
specific ation[lQ]}  which  precisely  describes  each  requirement  of  the  software  to  be  produced-  In  our  model, 
software  requirements  specifications  are  a combination  of  natural  language  and  components  specified  in 
PLEASE.  PLEASE  specifications  may  be  transformed  into  prototypes  which  can  be  used  for  experimenta- 
tion and  evaluation;  they  are  also  formal  specifications  of  components  to  be  produced  which  can  be  used 
throughout  the  rest  of  the  life-cycle.  By  providing  executable  components  early  in  the  development  pro- 
cess, errors  in  the  requirements  specification  may  be  discovered  and  corrected  before  the  internal  structure 
of  the  system  has  been  defined. 

Although  a software  system  may  be  shown  to  meet  the  specification,  this  does  not  imply  that  the  sys- 
tem satisfies  the  customers  requirements.  The  validation  phase  attempts  to  show  that  any  system  which 
satisfies  the  specification  will  also  satisfy  the  customers  requirements,  that  is,  that  the  requirements 
specification  is  valid.  If  not,  then  the  requirements  specification  should  be  corrected  before  the  develop- 
ment proceeds  any  further.  In  this  phase  the  systems  analyst  interacts  with  the  users  to  produce  the  sys- 
tem validation  summary[ 35],  which  describes  the  customer’s  evaluation  of  the  software  requirements 
specification. 

To  aid  in  the  validation  process,  the  PLEASE  components  in  the  specification  may  be  passed  to  a 
prototyping  expert  who  transforms  them  into  executable  prototypes  which  satisfy  the  specifications.  These 
prototypes  may  be  used  by  the  systems  analyst  in  his  interactions  with  the  customers;  they  may  be  sub- 
jected to  a series  of  tests,  be  delivered  to  the  customers  for  experimentation  and  evaluation,  or  be  installed 
for  production  use  on  a trial  basis.  The  use  of  prototypes  may  increase  customer/developer  communica- 
tion and  enhance  the  validation  process.  If  it  is  found  that  the  specification  does  not  satisfy  the  customers, 
then  it  is  revised,  new  prototypes  are  produced,  and  the  validation  process  is  reinitiated;  this  cycle  is 
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Figure  1.  Program  Development  Model 


repeated  until  a validated  specification  is  produced. 


5 


The  validated  specification  then  undergoes  a refinement,  or  design  transformation , in  which  more  of 
the  structure  of  the  system  is  defined  and  implemented.  This  phase  produces  a software  design 
specification^ ],  which  provides  a record  of  the  design  decisions  made  during  the  transformation.  During 
the  transformation,  prototypes  produced  from  PLEASE  specifications  may  be  used  in  experiments  per- 
formed  to  guide  the  design  process.  The  design  transformation  may  produce  components  in  the  implemen- 
tation language  Path  Pascal  as  well  as  an  updated  requirements  specification.  Components  which  have 
been  implemented  need  not  be  refined  further,  but  components  which  are  only  specified  will  undergo 
further  refinements  until  a complete  implementation  is  produced. 

Although  a new  specification  has  been  created,  it’s  relationship  to  the  original  is  unknown.  Before 
further  refinements  are  performed,  a verification  phase  must  show  that  any  implementation  which  satisfies 
the  lower  level  specification  will  also  satisfy  the  upper  level  one.  In  our  model,  this  may  be  accomplished 
using  any  combination  of  mathematical  reasoning[l4,19,27,38],  testing[ll,18,28],  technical  review[36],  and 
inspection.  The  use  of  PLEASE  specifications  enhances  the  verification  of  system  components  using  either 
testing  or  proof  techniques.  The  specification  of  a component  can  be  transformed  into  a prototype.  This 
prototype  may  be  used  as  a test  oracle  against  which  the  implementation  can  be  compared.  Since  the 
specification  is  formal,  proof  techniques  may  be  used  which  range  from  a very  detailed,  completely  formal 
proof  using  mechanical  theorem  proving  to  an  argument  presented  as  in  a mathematics  text.  PLEASE 
provides  a framework  for  the  nVorous[l9]  development  of  programs.  Although  detailed  formal  proofs  are 
not  required  at  every  step,  the  framework  is  present  so  that  they  can  be  constructed  if  necessary.  Parts  of 
a project  may  use  detailed  formal  verification  while  other,  less  critical  parts  may  be  handled  using  less 
expensive  techniques. 

To  clarify  our  model  further  and  show  how  PLEASE  specifications  enhance  the  development  process, 
we  will  consider  an  example  of  system  development.  We  will  follow  the  development  through  requirements 
definition,  validation  of  the  original  requirements  specification,  a single  refinement  step,  and  verification  of 
the  design  transformation. 
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3.  An  Example  of  Program  Development 

Assume  that  a customer  needs  a program  which  sorts  a list  of  integers.  The  program  should  read  the 
list  from  input,  produce  a sorted  list  which  is  a permutation  of  the  original,  and  write  the  sorted  list  to 
output.  A pre-existing  module  implementing  lists  of  integers  is  to  be  reused.  In  the  requirements 
definition  phase,  the  customer  discusses  his  needs  with  the  systems  analyst  and  a requirements  specification 
is  produced.  Along  with  other  documentation,  this  specification  might  contain  a sort  program  specified  in 
PLEASE. 

3.1.  Specifying  a Program 

Figure  2 shows  a PLEASE  specification  for  such  a program.  The  specification  uses  the  component 
integer Jist. spec  which  specifies  the  module  integer  Jist2.  This  module  uses  the  PLEASE  type  list  to  define 
the  type  integer  Jist  as  list  of  integer . In  PLEASE,  as  in  Lisp  or  Prolog,  lists  may  have  varying  lengths 
and  there  is  no  explicit  allocation  or  release  of  storage.  However,  in  PLEASE  the  strong  typing  of  Pascal 
is  retained  and  all  the  elements  of  a list  must  have  the  same  type.  In  PLEASE,  a list  is  denoted  by  a 
comma  separated  list  of  elements  surrounded  by  < and  >.  The  function  hd(L)  returns  the  first  element  in 
a list  L and  the  function  tl(L)  returns  L with  the  first  element  removed.  The  function  L | | Lg  yields  the 
concatenation  of  the  elements  of  Lx  and  Lgi  and  the  constant  emptyjist  denotes  a list  containing  no  ele- 
ments. 

The  specification  for  the  sort  program  defines  the  predicates  permutation  and  sort , as  well  as  giving 
pre-  and  post-conditions  for  the  program.  In  PLEASE,  a predicate  defines  a logical  expression  which  can 
be  used  elsewhere.  It  syntactically  resembles  a procedure  and  may  contain  local  type,  variable,  function  or 
predicate  definitions.  The  predicate  permutation  states  that  two  lists  are  permutations  of  each  other  if 
both  of  the  lists  are  empty,  or  if  the  first  element  in  the  second  list  is  in  the  first  list,  and  the  remainder  of 
the  two  lists  are  permutations  of  each  other.  The  predicate  sorted  states  that  a list  is  sorted  if  it  is  empty, 
or  if  the  first  element  in  the  list  is  the  smallest  and  the  rest  of  the  list  is  also  sorted.  This  predicate  may  be 

2The  statement  # include  “integer Jist.  spec”  instructs  a pre-processor  to  include  text  from  the  file  integer  Jist. spec 
into  the  specification  before  further  processing. 
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program  sortCinput,  output)  ; 

#mclude  " integer^list.spec" 

var  input_list,  output_list  : integer_list  ; 

predicate  permutation (listl,  list2  : integer_llst)  ; 
var  front,  back  : mteger_list  ; 

begin 

(listl  = empty_list)  and  (list2  = empty^list) 
or 

(listl  = front  I I < hd (list2) > I I back)  and 
permutation (front  II  back,  tl (list2) ) 

end  ; 

predicate  sorted  (l:integer_list)  ; 
var  x : integer  ; 

begin 

(1  = empty_list) 
or 

forallC  x I member (x,tl (1) ) , x >=  hd(l))  and 
sorted(tl(l)) 

end  ; 

precondition  ; 
begin 

text_to_integer_list (input)  <>  integer_list_error 

end  ; 

post^condi tion  ; 
begin 

(input_list  = text_to_integer_list (input) ) and 
permutation (input_list,  output_list)  and 
sorted (output_list)  and 

(output_list  = text_to_integer_list (output *) ) 

end  ; 

begin 

end. 


Figure  2.  Specification  of  Sort  Program 


read  as,  a list  L is  sorted  if  L is  empty,  or,  if  for  all  Xsuch  that  X\s  a member  of  the  tail  of  L,  A"  is  greater 
than  or  equal  to  the  head  of  L,  and  the  tail  of  L is  sorted. 

In  PLEASE,  the  pre-condition  for  a program  specifies  the  conditions  that  the  input  data  must  meet 
before  execution  begins.  The  post-condition  specifies  the  conditions,  possibly  relative  to  the  input,  that 
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the  output  must  meet  after  execution  has  been  completed.  The  pre-condition  for  the  program  sort 
specifies  that  the  input  file  must  contain  the  text  representation  for  a valid  list  of  integers.  The  function 
textjtojintegerjist  projects  from  objects  of  type  text  onto  objects  of  type  integer Jist , and  returns  the  con- 
stant integer Jist_error  for  inputs  which  are  not  valid.  The  post-condition  for  sort  states  that  when  the 
input  and  output  files  are  projected  onto  integerjists,  the  output  is  a permutation  of  the  input  and  the 
output  is  sorted.  The  notation  output’  denotes  the  value  of  output  after  the  program  has  executed,  while 
output  denotes  the  value  before  execution  begins. 

After  the  requirements  specification  has  been  created,  it  must  be  validated.  The  systems  analyst  can 
discuss  the  specification  with  the  customer  and  obtain  test  data  and  expected  results  for  the  system.  The 
PLEASE  specification  then  can  be  given  to  an  expert  prototyper,  who  can  produce  a prototype  which 
satisfies  the  specification.  If  the  prototype  performs  correctly  on  the  test  data  it  can  be  delivered  to  the 
customer  for  evaluation.  If  the  prototype  does  not  perform  correctly,  then  we  know  the  specification  is 
invalid1 * 3. 

3.2.  Prototyping  the  Specification 

Figure  3 shows  a simplified  version  of  the  Prolog  code  which  might  be  produced  from  the 
specification  of  the  sort  program  by  an  expert  prototyper.  There  are  Prolog  procedures  for  the  predicates 
permutation  and  sort , as  well  as  for  the  program  pre-  and  post-conditions  and  the  program  as  a whole. 
The  procedure  sort  simply  reads  the  input,  executes  the  pre-condition,  executes  the  post-condition,  and 
then  writes  the  output.  The  notion  of  execution  is  quite  different  for  pre-  and  post-conditions.  Executing 
a pre-condition  involves  checking  that  given  data  satisfies  a logical  expression;  for  example, 
sort^pre^condition  simply  checks  that  the  function  textjtojintegerjist  does  not  return  the  error  indication 
when  called  with  the  input  to  the  program.  Executing  a post-condition  means  finding  data  that  satisfies  a 
logical  expression;  for  example,  sort^postjcondition  must  find  a value  for  the  output  such  that  when  the 

1 Note  that  if  the  prototype  does  satisfy  the  customer,  we  know  only  that  a particular  implementation  does  so. 
This  does  not  necessarily  mean  that  ail  implementations  which  satisfy  the  specification  would  be  considered  adequate 

by  the  customer.  While  prototypes  may  enhance  the  validation  process,  they  do  not  replace  communication  with  cus- 

tomers and  review  of  the  specification. 
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permutation  ([],[])• 
permutation  (List  1,  [He ad 2 1 Tail2] ) 

append  (Front,  [Head2lBack]  ,Listl), 
append  (Front, Back, Temp), 
permutation  (Temp,Tail2) 


sorted  ( [] ) . 
sorted (L) 

tl  (L,Tail) , 
hd  (L,Head) , 

forall  (member (X, Tail) , (X  >=  Head)), 
sorted (Tail) 


sort  pre  condition (Input) 

not (text_to_integer_l is t( Input,  integer_list_error) ) 


sort_post__cond  it  ion  (Input, Output) 

text_to_in teger_l is t (Input,  Input_list), 
permutation (I nput_l 1st,  Output_list) , 
sorted (Output_list) , 

text_to_integer_l  ist  (Output, Output_list) 


sort 

read (Input) , 

sort_pre_condition (Input) , 

sort  post_condition( Input ,Output) , 

write (Output) 


Figure  3.  Prolog  Code  Produced  from  Sort  Specification 


input  and  output  are  projected  to  lists  of  integers,  the  input  and  output  are  permutations  of  each  other 
and  the  output  is  sorted. 

To  accomplish  this,  sort_post_condition  converts  the  input  data  from  text  form,  performs  a naive 
sort,  and  converts  the  output  back  to  text.  The  procedure  permutation  functions  as  a generator  and  the 
procedure  sorted  as  a selector . When  s or t_pos ^condition  is  invoked  textjtojintegerJist  is  called  to  convert 
from  text  to  lists  of  integers,  permutation  is  called  to  generate  a permutation  of  the  input  list,  and  then 
sorted  is  then  called  to  determine  if  the  permutation  is  sorted.  If  sorted  fails,  then  execution  backtracks 


10 


and  permutation  generates  the  next  permutation  to  be  evaluated.  This  continues  until  a sorted  permuta- 
tion is  generated.  At  this  point  sorted  succeeds,  textjto_integerJist  is  called  to  convert  the  output  to  text 
format,  and  sort_post_condition  returns. 

Although  this  program  produces  a sorted  list  of  integers  it's  performance  will  be  quite  poor;  in  the 
worst  case,  all  the  permutations  of  the  input  list  will  be  generated  and  tested.  The  performance  could  be 
improved  by  substituting  a pre-existing  procedure  which  implements  a superior  sorting  algorithm  for  the 
section  of  sort_post_condition  which  actually  performs  the  sort.  A prototyping  expert  might  search 
libraries  of  specifications  and  prototypes  to  find  reusable  components  which  would  improve  the  perfor- 
mance of  the  prototype  under  construction.  A prototype  with  better  performance  characteristics  might  be 
subjected  to  more  extensive  testing  and  evaluation  before  further  design  transformations  are  applied. 
After  the  specification  for  sort  has  been  validated,  it  can  be  transformed  into  a more  concrete  form. 

3.3.  Refining  the  Specification 

Assume  that  a decision  is  made  to  implement  the  program  using  the  quicksort  algorithm.  As  a first 
step,  the  original  specification  might  be  refined  to  produce  a PLEASE  program  which  converts  the  input 
from  text  to  lists  of  integers',  calls  a procedure  sort  to  produce  a sorted  list,  converts  this  list  to  text,  and 
‘ hen  writes  the  text  to  output.  Figure  4 shows  the  specification  of  the  procedure  sort  which  would  be  used 
in  such  a program.  This  procedure  takes  a list  of  integers  as  input  and  produces  a sorted  list  as  output. 
First,  an  element  is  selected  from  the  input  list  and  the  list  is  partitioned  into  two  sublists,  low  and  high, 
so  that  all  the  members  of  low  are  less  than  the  selected  element  and  all  the  members  of  high  are  greater. 
The  lists  high  and  low  are  then  sorted  recursively  and  the  results  combined  to  form  a sorted  permutation 
of  the  input. 

Although  this  refinement  has  narrowed  the  possible  implementations  to  those  using  the  quicksort 
algorithm,  there  are  still  many  design  decisions  left  unmade.  The  new  specification  may  be  refined  into  a 
family  of  quicksort  programs;  these  programs  might  differ  in  many  characteristics,  but  all  would  satisfy 
the  specification.  For.  example,  the  specification  for  the  procedure  select  only  requires  that  element  be  a 
member  of  list;  the  algorithm  used  to  select  a particular  element  is  not  specified  at  this  level  of  abstraction. 
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procedure  sortCinput  : integer_list  ; var  output  : integer_list)  ; 
var  element  : integer  ; 

less,  greater,  sorted_high,  sorted_low  : integer_list  ; 

procedure  selectCinput  : integer_list,  var  element  : integer)  ; 
pre-condition  ; 

begin  true  end  ; 
post_condition  ; 

begin  member (element,  input)  end  ; 

procedure  partition (list  : integer_list  ; element  : integer  ; 
var  low,  high  : integer-list  ) ; 
pre_condition  ; 

begin  member (element,  list)  end  ; 

post-Condition  ; 

var  1,  h : integer  ; 
begin 

permutation ( list,  low  I 1 < element  > f I high)  and 
forall(  1 I memberO,  low),  1 <=  element  ) and 
foralK  h I member(h,  high),  h >=  element) 

end  ; 

procedure  combine (sorted_low  : integer— list  ; element  : integer  ; 

sorted-high  : integer-list  ; var  output  : integer-list)  ; 
pre_condition  ; 

begin  true  end  ; 
post-Condition  ; 

begin  output'  = sorted_low  ! I element  I I sorted-high  end  ; 

precondition  ; 

begin  true  end  ; 
post_condition  ; 

begin  permutation (input,  output)  and  sorted (output)  end; 
begin  (*  sort  *) 

if  (input  = empty_list)  then  output  :=  empty-list 
else  begin 

select  (input,  element)  ; 
partition(input,  element,  low,  high)  ; 
sort(low, sorted_low)  ; sort(high,  sorted_high)  ; 
combine (sorted_low,  element,  sorted-high,  output)  ; 

end  ; 

end  ; (*  sort  *) 


Figure  4.  Part  of  Refinement  of  Sort  Specification 


Similarly,  the  specification  for  partition  only  states  that  all  the  elements  in  low  are  less  than  or  equal  to 
element  and  all  the  elements  in  high  are  greater  than  or  equal  to  element ; it  says  nothing  about  the 
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algorithm  used  to  produce  these  lists.  As  the  specification  is  refined  further  these  algorithms  will  be 
defined,  thereby  narrowing  the  acceptable  implementations.  The  data  types  used  may  undergo  refinement 
as  well  as  the  algorithms;  for  example,  the  module  integer Jist  may  be  refined  to  use  an  array  instead  of  a 
list  representation.  However,  before  the  new  specification  is  refined  further,  it  must  be  shown  that  any 
program  which  satisfies  the  new  specification  will  also  satisfy  the  original. 

3.4.  Verifying  the  Refinement 

A number  of  different  methods  may  be  used  to  show  that  the  refined  specification  satisfies  the  origi- 
nal. In  the  most  informal  case,  inspection  of  the  original  and  refined  specifications  by  a senior  designer,  or 
some  type  of  peer  review  process  might  be  used.  A more  rigorous  approach  might  run  prototypes  pro- 
duced from  the  original  and  refined  specifications  on  the  same  test  data  and  compare  the  results;  this 
method  gives  significant  assurance  at  low  cost.  However,  in  the  words  of  E.  W.  Dijkstra,  “Program  testing 
can  be  used  to  show  the  presence  of  bugs,  never  to  show  their  absence.”  In  the  most  rigorous  case, 
mathematical  reasoning  would  be  used. 

The  Vienna  Development  Method [19]  provides  rules  that  can  be  used  to  generate  verification  condi- 
tions for  a refinement.  If  the  verification  conditions  are  always  true,  then  any  implementation  which 
satisfies  the  refined  specification  will  also  satisfy  the  original.  Figure  5 shows  the  verification  rules  for 
sequential  and  conditional  statements.  Pre_OP.  (a)  is  the  pre-condition  for  OP. ; cr  represents  the  parame- 
ters, explicit  or  implicit,  to  the  pre-condition.  Each  OP.  is  verified  separately.  Rule  di  guarantees  that  if 
the  pre-condition  for  OP  is  true  before  the  sequence  begins  execution  and  OP % through  OP.  x execute 
correctly,  then  the  pre-condition  for  OP.  will  be  true.  Rule  rl  guarantees  that  if  OP  x through  OPn  execute 
correctly,  then  the  post-condition  for  the  entire  sequence  will  be  true. 

To  generate  verification  conditions,  the  appropriate  pre-  and  post-conditions  are  simply  substituted 
into  the  verification  rules.  For  example,  to  generate  verification  conditions  for  the  sort  procedure,  the  rule 
for  conditional  statements  is  applied  first;  the  expression 

input  = empty  Ms  t 
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For 


For 


Figure  5 


is  substituted  for  e, 


for  OP t , and 


for  0P£ . Pre- and  post 


OP  = op1 

; 0P2 ; • • • ; 0Pn  to  be  correct, show 

dl. 

pre_0PO)  =>  pre_OPt(<T) 

d2. 

preJ^PC^j)  and  postjaPjC^,^)  => 

pre_0P2("2) 

d3. 

pre  OPOj)  and  post^PjCfT^ffj)  and 

post_0P2('T2,'T3)  =>  pre_0P3O3) 

dn.  preJDPC^)  and  pos^OPj (^t1(<t2)  and 

post_OP2(^ol^3)  and  . . . and 
post_OP(,.1)(«r,.I,ffB)  =>  pre_0Pn(<7n) 

rl.  pre_OP(^1)  and  post^OP^rr^^)  and 

post_0P2(^2lrr3)  and  . . . and 
POSt-OP/Wl5  =>  POSt_0PC'Tl*'Tn»l) 

OP  = IF  e THEN  0P1  ELSE  0P2  to  be  correct,  show  : 

da,  pre_OP(<x)  and  eval(e,<r)  =>  pre^OP^^r) 

db.  pre_OP(rr)  and  not  eval(e.rr)  =>  pre_0P2O) 

ra.  pre^PCr^)  and  eval  (e  , crj  and 
post_0P1(r71><72)  =>  postJDPO^o^) 

rb.  pre_0P((Ti)  and  not  eval  (e  , 0^)  and 
poSt_0P2(rTl,rr2)  =>  post_OP  O^) 

, Verification  Rules  for  Sequential  and  Conditional  Statements 


output  :=  emptyjist 


begin  select(input, element)  ; ...  end 

conditions  for  the  begin  ...  end  block  then  are  generated  to  facilitate  the  proof. 
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The  rule  for  sequential  statements  then  is  applied  with  begin  ...  end  substituted  for  OP,  select(...)  for  OP % , 
partitionf..,)  for  OP  z,  sort(low,sortedjow)  for  0P3,  sort(high,sorted_kigh)  for  OP and  combine(...)  for 
0P5.  If  the  formulae  produced  by  these  substitutions  are  always  true,  then  any  implementations  of  select , 
partition , and  combine  which  satisfy  the  appropriate  pre-  and  post-conditions  will  produce  a correct 
implementation  of  sort . 

Automated  tools  may  be  used  to  perform  the  appropriate  substitutions  and  format  the  resulting  logi- 
cal formulae.  These  formulae  may  then  be  proved  by  inspection,  rigorous  argument,  or  using  an 
automatic  theorem  prover;  the  SAGA  project  has  developed  a system  which  supports  the  creation  and 
management  of  proofs  using  a number  of  automated  theorem  provers[l5].  Once  the  refinement  has  been 
verified,  the  new  specification  may  be  refined  further,  and  the  process  repeated  until  an  implementation  is 
produced.  Although  this  example  shows  only  the  specification  of  an  entire  program,  PLEASE  may  also  be 
used  to  specify  separately  compiled  components  such  as  abstract  data  types. 

4.  Specifying  Abstract  Data  Types 

It  has  been  proposed  that  the  use  of  abstract  data  types  can  enhance  program  specification  and 
verification[13,14,20,26,29].  In  PLEASE,  abstract  data  types  may  be  specified  using  an  extension  of  Path 
Pascal  objects.  Figure  6 shows  the  specification  of  an  object  implementing  a stack  of  integers  in  terms  of 
the  type  integer Jist  or  list  of  integer . An  object  has  a scope  like  a procedure  or  function;  the  variables 
declared  local  to  the  object  form  its  sfa£e[l9j,  in  this  case  a single  variable  of  type  integer Jist  The  invari- 
ant defines  the  set  of  legal  states,  in  other  words  the  permitted  values  of  the  state  variables;  the  invariant 
must  be  true  both  before  and  after  the  execution  of  any  procedure  which  manipulates  the  state.  The 
post-condition  for  a procedure  or  function  associated  with  an  object  should  specify  the  value  of  the  state 
at  the  end  of  execution,  as  well  as  the  values  of  any  output  parameters. 

The  stack  has  four  entry  procedures  which  may  be  called  from  outside  the  object;  any  procedures  or 
functions  not  so  declared  may  not  be  invoked  from  an  external  scope.  The  first  item  in  the  object  is  the 
path  expression,  which  can  be  used  to  specify  synchronization  constraints;  in  this  case  no  constraints  are 
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type  stack  = ob j ect 


path  push,  pop,  top,  empty  end  ; 

var  s : integer_list  ; 

invariant  ; 

begin  true  end  ; 

entry  procedure  pushCelmt  : integer)  ; 
precondition  ; 

begin  true  end  ; 
post_condition  ; 

begin  s’  = < element  > I I s end  ; 
entry  procedure  pop  ; 

precondition  ; 

begin  true  end  ; 

post_condition  I 

begin  s * = tl (s)  end  ; 
entry  function  top  : integer  ; 

precondition  » 

begin  not(empty)  end ; 

post_condition  I 

begin  s’  = s and  top’  = hd(s)  end  ; 
entry  function  empty  : boolean  ; 

precondition  ; 

begin  true  end  ; 

postcondition  J 

begin 

(empty'  = true  and  s = empty_list)  or 
(empty’  = false  and  s <>  empty_list) 

end  ; 


initially  ; 

precondition  ; 

begin  true  end  ; 

post_condition  ; 

begin  s'  = empty_list  end  ; 


end  ; (*  stack  *) 


Figure  6.  Stack  of  Integers  in  Terms  of  integerjist 


specified,  so  all  execution  sequences  are  allowed.  The  procedure  push  takes  an  integer  and  puts  it  on  the 
stack,  while  the  procedure  pop  removes  the  top  element  from  the  stack.  The  function  top  returns  the 
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integer  at  the  top  of  the  stack  while  the  function  empty  checks  if  any  items  are  on  the  stack.  The  initially 
block  is  executed  when  storage  for  the  object  is  allocated  and  may  be  used  to  set  the  initial  value  of  the 
state. 

5.  Implementation 

A prototype  implementation  of  PLEASE  is  being  constructed  on  a Vax  running  BSD  4.2  Unix4.  In 
this  implementation,  PLEASE  specifications  are  transformed  into  code  for  the  UNSW  Prolog  Inter- 
preter [33].  In  a program  which  combines  modules  written  in  conventional  languages  with  PLEASE  proto- 
types,  the  Prolog  interpreter  is  run  as  a co-routine  which  uses  Unix  pipes  to  communicate  with  the  rest  of 
the  program.  When  a call  is  made  to  a routine  which  is  implemented  using  Prolog,  the  parameters  are 
converted  to  the  appropriate  format  and  sent  down  the  pipe  to  the  interpreter.  When  the  execution  is 
complete,  the  results  are  sent  back  up  the  pipe,  converted  to  the  proper  format,  and  the  call  returns.  A set 
of  standard  representations  for  PLEASE  data  types  has  been  devised,  and  routines  to  manipulate  these 
representations  have  been  added  to  the  Prolog  run-time  library. 

To  prototype  a module  with  a procedure  call  interface,  the  PLEASE  specification  is  transformed  into 
a body  and  a number  of  headers . The  body  contains  code  in  a programming  language  which  may  be  com- 
piled using  standard  tools  to  produce  an  object  file.  The  headers  contain  interface  specifications,  which 
may  be  included  during  the  separate  compilation  of  other  components  which  use  the  body.  The  object 
code  for  the  body  can  then  be  linked  in  with  the  object  files  produced  to  create  an  executable  system. 
Using  this  method  we  have  created  systems  which  integrate  modules  written  in  C,  Pascal,  and  Path  Pascal 
with  prototypes  created  from  PLEASE  specifications. 

6.  Future  Work 

Although  PLEASE  is  currently  an  extension  to  Path  Pascal,  the  basic  specification,  verification  and 
prototyping  methods  are  independent  of  the  implementation  language  used.  In  the  long  term,  we  plan  to 


4 Unix  is  a trademark  of  AT&T  Beil  Laboratories 
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use  Ada5  as  our  implementation  language. 

At  present,  the  transformation  of  PLEASE  specifications  into  Prolog  code  is  largely  a manual  pro- 
cess. We  have  designed  a system  to  perform  many  of  these  transformations  automatically.  This  system 
will  search  libraries  of  specifications  and  implementations  for  components  to  be  reused  in  the  prototype 
being  constructed.  We  hope  this  will  allow  the  automatic  prototyping  of  a large  class  of  PLEASE  pro- 
grams. We  plan  to  build  a prototype  implementation  to  better  judge  the  feasibility  of  this  approach.  We 
also  plan  to  investigate  the  possibility  of  extending  these  tools  into  an  expert  system  for  prototyping.  For 
example,  if  the  system  could  not  find  a component  with  an  logically  equivalent  specification,  then 
specifications  with  weaker  pre-conditions  and  stronger  post-conditions  could  be  considered.  The  system 
also  might  aid  in  the  reconfiguration  of  prototypes  for  different  operating  environments. 

In  the  current  implementation,  prototypes  produced  from  PLEASE  specifications  run  quite  slowly  as 
the  Prolog  code  is  interpreted  and  the  interface  between  languages  is  inefficient.  We  expect  that  the  per- 
formance of  these  prototypes  can  be  dramatically  increased  by  the  use  of  commercially  available  Prolog 
compilers,  such  as[l],  which  produce  high  quality  machine  code  and  provide  interfaces  to  conventional 
languages.  We  plan  to  adapt  our  implementation  for  use  with  a Prolog  compiler  and  continue  our  efforts 
to  increase  the  performance  of  the  prototypes  produced  from  PLEASE  specifications. 

We  are  investigating  the  problems  involved  with  the  formal  verification  of  systems  specified  in 
PLEASE,  and  plan  to  investigate  the  problems  encountered  in  using  our  methods  on  large  projects.  We 
plan  to  gain  experience  by  specifying,  prototyping,  implementing,  and  verifying  a medium  sized  system 
using  our  methods. 

7.  Summary  and  Conclusions 

PLEASE  is  an  executable  specification  language  which  supports  program  development  by  incremen- 
tal refinement.  Software  components  are  first  specified  using  a combination  of  conventional  programming 
languages  and  mathematics.  These  abstract  components  are  then  incrementally  refined  into  programs  in 

5 ADA  is  a trademark  of  the  U.S.  Government,  Ada  Joint  Program  Office. 
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an  implementation  language.  Each  refinement  is  verified  before  another  is  applied;  therefore,  the  final 
components  produced  by  the  development  satisfy  the  original  specifications. 

Path  Pascal  is  an  extension  to  standard  Pascal  which  supports  concurrency  and  encapsulated  data 
types.  PLEASE  is  an  extension  to  Path  Pascal  which  allows  a procedure  or  function  to  be  specified  using 
pre-  and  post-conditions  written  in  predicate  logic  and  an  abstract  data  type  to  have  a type  invariant. 
PLEASE  specifications  may  be  used  in  proofs  of  correctness,  and  may  also  be  transformed  into  executable 
prototypes. 

We  believe  that  the  early  production  of  executable  prototypes  for  experimentation  and  evaluation 
will  enhance  the  development  process.  Prototypes  may  increase  the  communication  between  customer  and 
developer,  thereby  enhancing  the  validation  process.  Prototypes  produced  from  PLEASE  specifications 
may  be  used  in  experiments  performed  to  guide  the  design  process.  PLEASE  specifications  may  enhance 
the  verification  phase  by  providing  a framework  for  the  rigorous  development  of  programs.  Prototypes 
produced  from  different  level  PLEASE  specifications  can  be  run  on  the  same  test  data  and  the  results  com- 
pared; this  method  can  give  significant  assurance  that  a refinement  is  correct  at  a low  cost.  PLEASE 
specifications  may  also  be  used  in  formal  proofs  of  correctness.  PLEASE  prototypes  are  based  on  existing 
Prolog  technology,  and  their  performance  will  improve  as  the  speed  of  Prolog  implementations  increases. 
We  believe  that  the  use  of  PLEASE  specifications  will  enhance  the  design,  development,  verification  and 
reuse  of  software. 
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Abstract 

ENCOMPASS,  a prototype  software  development  environment,  is  being  constructed  from  com- 
ponents built  by  the  SAGA  project.  Application  of  SAGA  to  the  major  phases  of  the  lifecycle 
will  be  demonstrated  through  ENCOMPASS.  The  system  will  include  configuration  manage- 
ment; a software  design  paradigm  based  on  the  Vienna  Development  Method;  executable 
specifications;  languages  which  can  be  used  to  support  modular  programming,  like  Berkeley  Pas- 
cal or  ADA;  verification  and  validation  tools  and  methods;  and  basic  management  tools.  EN- 
COMPASS is  intended  to  examine  many  of  the  requirements  for  the  design  of  complex  software 
development  environments  such  as  might  be  used  to  construct  the  space  station  software.  It  is 
intended  to  be  used  as  a prototype  for  examining  many  of  the  more  advanced  features  that  will 
be  required  in  future  generations  of  software  development  environments  which  support 
aerospace  applications.  In  this  paper,  we  describe  the  framework  adopted  within  ENCOMPASS 
to  provide  automated  management.  We  exemplify  the  approach  using  an  example  taken  from 
problem  tracking  and  change  control  during  software  maintenance. 

1.  Introduction. 

Research  into  the  software  development  process  is  required  to  reduce  the  cost  of  producing  software 
and  to  improve  software  quality.  Modern  software  systems,  such  as  the  embedded  software  required  for 
NASA’s  space  station  initiative,  stretch  current  software  engineering  techniques.  Embedded  software 
systems  often  are  large,  must  be  reliable,  and  must  be  maintainable  over  a period  of  decades.  The 
software  support  environment  for  building  such  software  systems  must  ensure  a high-level  of  quality 
while  enabling  the  embedded  software  and  the  hardware  on  which  the  software  runs  to  change  and  the 
applications  for  which  the  embedded  system  is  designed  to  evolve.  Furthermore,  such  environments 
must  be  cost  effective. 

The  SAGA  project  is  investigating  the  design  and  construction  of  software  engineering  environ- 
ments for  developing  and  maintaining  aerospace  systems  and  applications  software  (5,7).  The  research 
includes  the  practical  organisation  of  the  software  lifecycle;  configuration  management;  software  require- 
ments specification;  executable  specifications;  design  methodologies;  programming;  verification; 
validation  and  testing;  version  control;  maintenance;  the  reuse  of  software;  software  libraries;  documen- 
tation and  automated  management  (5,11,15,17,18,19,23,24,27,28).  An  overview  of  the  SAGA  project 
components  is  shown  in  Figure  1.  The  tools  and  concepts  resulting  from  SAGA  are  being  used  to 
develop  a prototype  software  development  system  called  ENCOMPASS  (28).  The  ENCOMPASS 
software  development  paradigm  is  shown  In  a diagrammatic  form  in  Figure  2.  Although  the  research 
has  developed  many  general  tools  and  concepts  that  are  independent  of  the  application  language  and 
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Figure  Is  The  SAGA  workbench  components 

domain,  we  hope  to  extend  ENCOMPASS  to  support  the  development  of  large,  embedded  software  sys- 
tems written  mainly  in  ADA. 

In  this  paper,  we  study  mechanisms  to  automate  the  management  of  ENCOMPASS  using  a simple 
example  based  on  the  maintenance  activities  of  problem  tracking  and  change  control.  We  describe  the 
prototype  configuration  management  system  underlying  ENCOMPASS  and  discuss  the  inter elationships 
between  this  system  and  the  automated  management  mechanisms. 


2.  The  Software  Development  Environment. 

To  be  effective,  a software  development  environment  must  actively  support  the  software  develop- 
ment process  (5).  It  must  be  easier  to  use  the  software  development  tools  and  the  environment  than  to 
use  other  tools  and  a general  operating  system. 

The  SAGA  project  is  concerned  with  software  development  environments,  not  with  the  construc- 
tion of  a general  operating  system.  We  assume  that  SAGA  will  be  used  in  conjunction  with  a general 
operating  system  such  as  Berkeley  UNIX  4.2BSD  that  provides  a hierarchically  structured  file  system, 
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Figure  2:  The  ENCOMPASS  software  development  paradigm. 

virtual  memory,  processing  operations,  and  mail  service.  Further,  we  assume  that  SAGA  will  be  used  in 
conjunction  with  an  extension  of  the  operating  system  that  supports  a networked  workstation  environ- 
ment, perhaps  using  LINK  (25),  a kernel  based  version  of  UNIX  United  (2),  that  supports  transparent 
remote  network  file  access,  remote  spooling  and  remote  processing. 

The  SAGA  environment  consists  of  a configuration  management  system  and  a workbench  of 
software  development  tools  which  are  used  in  a set  of  development,  management  and  maintenance 
activities. 

The  configuration  management  system  stores  and  structures  the  software  components  developed  by 
a project  which  may  include  programs,  test  data,  documents,  manuals,  designs,  proofs,  specifications, 
and  contracts. 

The  development,  management  and  maintenance  activities  manipulate  the  software  components 
being  built.  They  include  the  actions  of  the  software  developers,  managers,  testers,  quality  assurance 
teams,  and  librarians,  such  as  the  editing,  compilation,  or  testing  of  a program,  formatting  of  a docu- 
ment, or  delegation  of  a task. 
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The  workbench  of  software  development  tools  provides  the  means  by  which  activities  can  manipulate 
the  software  components.  In  ENCOMPASS  (28),  this  workbench  is  the  set  of  SAGA  tools.  Development, 
management  and  maintenance  activities  interact  with  the  configuration  management  system  through  the 
SAGA  user  interface,  which  includes  the  SAGA  language-oriented  editor  Epos  (5,18). 

3.  The  Software  Lifecycle. 

The  SAGA  project  has  adopted  a “management  by  objectives”  (14)  approach  to  the  definition  of 
the  software  lifecycle  (1,12).  Each  phase  in  the  lifecycle  is  oriented  towards  satisfying  an  objective  by 
prancing  a milestone.  For  example,  the  requirements  specification  phase  produces  a set  of  properties 
that  the  software  system  to  be  constructed  must  satisfy.  Validation  consists  of  determining  that  the 
specification  of  the  system  satisfies  the  requirements  of  the  system  and  provides  an  important  milestone 
in  the  development  process.  Using  PLEASE  (27),  an  executable  specification  language,  validation  can 
take  the  form  of  “testing"  or  executing  the  system  specification.  In  a large  project  such  as  the  space  sta- 
tion software  development  program,  validation  may  take  the  form  of  prototyping  using  a mixture  of 
tools  including  PLEASE,  simulation,  standardized  library  routines  and  walk-throughs. 

The  design  phase  consists  of  incrementally  refining  the  requirements  specification  into  algorithms 
and  component  specifications.  It  has  been  shown  that  neither  testing  nor  formal  verifications  alone  can 
guarantee  correct  software  (9,10).  ENCOMPASS  can  provide  an  effective  verification  process  that  util- 
izes both  testing  and  formal  methods.  The  execution  of  the  PLEASE  specification  for  a component  pro- 
vides a test  oracle  for  later  use  in  the  verification  of  refinements.  Formal  specifications  and  design, 
methods  also  aid  software  reuse  (20,21,22). 

In  ENCOMPASS,  we  use  the  specifications  not  only  for  testing,  but  also  as  the  basis  for  rigorous 
and  formal  proofs  of  correctness.  Thus,  we  intend  that  the  system  specification  can  also  be  used  to 
prove  theorems  concerning  the  requirements  of  the  system  and  to  prove  that  a design  or  refinement  step 
correctly  implements  a specification. 

PLEASE  is  based  on  specifying  programs  using  pre—  and  post— conditions.  PLEASE  specifications 
are  implemented  as  an  extension  of  a programming  language.  Both  ADA  and  Path  Pascal  (6)  are  being 
used  as  vehicles  for  ENCOMPASS.  The  predicates  are  transformed  into  logic  programs  which  are  exe- 
cuted in  a Prolog  environment  (8)  that  is  invoked  from  the  principal  programming  language.  Many  of 
the  transformations  may  be  performed  automatically.  Research  tnto  automating  these  transformations 
continues. 

Verification  conditions  for  the  refinement  of  an  abstract  program  into  a more  concrete  one  can  be 
generated  during  program  design.  These  verification  conditions  may  be  inserted  into  a proof  tree  and 
TED  (15),  a proof  tree  editor,  may  be  used  to  manipulate  them.  In  particular,  TED  permits  proofs  to  be 
decomposed  into  sequences  of  lemmas.  Various  theorem  provers  may  be  invoked  to  mechanically  certify 
the  verification  condition. 

The  development  methodology  used  for  refining  system  specifications  into  programs  is  similar  to 
the  Vienna  Development  Method  (16,26).  A set  of  rules  specifies  the  verification  conditions  that  are 
required  for  a given  form  of  refinement.  These  rules  can  be  applied  automatically,  but  in  general  proof 
of  the  verification  conditions  requires  some  manual  labor.  Figure  2 summarizes  the  ENCOMPASS 
approach. 

The  use  of  formal  specifications  in  ENCOMPASS  is  encouraged  not  only  to  assist  code  and  design 
reuse,  to  promote  clarity,  to  aid  testing,  and  to  support  verification,  but  also  to  provide  acceptance 
criteria  which  may  be  used  as  management  objectives  for  a design  step.  The  objectives  can  range  from  a 
mechanical  proof  of  the  correctness  of  a design  decision  to  a substantial  set  of  test  data  for  which  the 
design  is  valid. 
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Many  of  the  objectives  of  each  software  development  phase  can  be  made  into  a milestone  by  requir- 
ing the  activities  of  the  phase  to  generate  a list  of  documented  products.  These  products  must  be  vali- 
dated before  the  phase  is  complete  to  ensure  that  the  phase  has  been  successful.  In  SAGA  and  ENCOM- 
PASS, we  can  use  language-oriented  tools  such  as  the  Epos  editor  to  further  enhance  the  documentation 
of  milestones.  These  tools  can,  we  believe,  automate  repetitive  effort  in  preparing  and  validating  the 
achievement  of  objectives  (4). 

Management  for  the  software  development  lifecycle  must  identify,  control,  and  record  the  develop- 
ment process.  A management  model  can  be  based  on  a trace  of  the  activities  within  the  project.  Such  a 
trace  can  be  used  to  understand  the  meaning  of  management  in  a similar  manner  to  the  use  of  traces  in 
defining  the  meaning  of  a programming  language  (Campbell  and  Lauer  (3)).  In  ENCOMPASS,  we  are 
implementing  a limited  set  of  management  functions  to  record,  monitor,  initiate  activities,  and  inhibit 
inappropriate  activities.  Instead  of  using  a detailed  model  of  management,  we  have  adopted  a simpler 
approach  based  on  the  larger  granularity  provided  by  milestones. 

4.  A Framework  for  Automated  Management. 

The  use  of  a management  by.  objectives  approach  (14)  in  the  software  lifecycle  introduces  clearly 
defined  milestones  that  are  agreed  upon  by  the  developer  and  manager.  The  management  objectives  for 
each  activity  must  define  the  pre-conditions  under  which  the  activity  may  occur,  acceptance  criteria  for 
the  products  produced  by  the  activity,  and  a procedure  for  evaluating  whether  the  acceptance  criteria 
have  been  met.  These  objectives  provide  a framework  around  which  the  management  of  the  software 
project  can  be  automated. 

A simple  demonstration  of  how  effective  such  a management  scheme  can  be  is  given  by  the  follow- 
ing simplified  example  of  managing  software  maintenance.  Figure  3 shows  the  organizational  structure 
of  a software  maintenance  group.  Analysts  and  programmers  are  responsible  to  a change  control  board 
for  their  contributions  to  the  maintenance  activity.  Bugs  and  requests  for  modifications  to  maintained 
software  are  received  by  the  maintenance  group.  The  change  control  board  manages  the  manpower  and 
resources  of  the  maintenance  group  and  decides  which  change  requests  should  be  satisfied  and  which 
change  requests  should  be  ignored. 

Figure  4 shows  a simplified  diagram  of  the  flow  of  information  that  occurs  within  the  maintenance 
group.  Users  submit  change  requests  to  the  maintenance  group.  The  change  control  board  assigns  pro- 
gram change  requests  to  an  analyst  for  further  examination.  A program  change  request  may  consist  of  a 
bug  report  or  a proposal  for  enhancements  to  the  software.  The  analyst  reviews  the  requests  and  pro- 


Figure  3:  Organization  structure 
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Figure  4:  Data  flow  for  change  requests 


duces  program  modification  plans  for  those  that  are  valid.  These  plans  are  forwarded  to  the  change  con- 
trol board  for  approval  and  scheduling.  The  change  control  board  may  either  allocate  a programmer  to 
work  on  a job  specification  based  on  the  plan,  or  it  may  reject  the  plan.  A rejected  plan  will  be  recon- 
sidered by  the  analyst. 

The  programmer  produces  the  appropriate  software  modifications  and  submits  them  to  the  change 
control  board.  The  board  examines  the  modifications  and  may  either  produce  a new  software  release  or 
generate  a new  job  specification  to  reconsider  the  software  modifications. 

A more  detailed  flow  diagram  for  the  change  requests  would  include  additional  feedback  stages  to 
allow  analysts  and  programmers  to  negotiate  their  objectives  with  the  change  control  board.  For  exam- 
ple, the  programmer  may  wish  to  question  the  time  allotted  to  accomplish  the  analyst’s  plan. 

In  ENCOMPASS,  the  management  system  for  change  control  is  implemented  using  SAGA  tools. 
Activities  within  the  change  control  system  are  coordinated  using  a combination  of  notesfiles,  mail, 
makefiles,  and  work  trays. 
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4.1.  The  Notesfile  System 

The  Notesfiles  system  is  a distributed  project  information  base  constructed  for  SAGA  on  the  UNIX 
operating  system  (11).  A file  of  notes  can  be  maintained  across  a network  of  heterogeneous  machines. 
Each  file  of  notes  has  a topic;  each  note  has  a title.  A sequence  of  responses  is  associated  with  each  note. 
Notes  and  responses  may  be  exchanged  between  separate  notesfiles.  Notes  and  responses  are  documented 
with  their  authors  and  times  of  creation.  Updates  to  the  notes  and  responses  are  transmitted  among 
networked  systems  to  maintain  consistency.  Notesfiles  use  the  standard  electronic  mail  facility  to  facili- 
tate the  updates.  A library  and  standard  interface  permits  any  user  program  to  submit  a note  or 
response  to  a notesfile.  This  library  has  been  used  in  the  construction  of  automatic  logging  and  error 
reporting  facilities  in  software  and  test  harnesses.  Within  the  SAGA  project,  we  have  used  the  Notesfile 
system  to  organise  technical  discussions,  product  reviews,  problem  tracking,  agendas  and  minutes, 
grievances,  design  and  specification  documentation,  lists  of  work  to  be  done,  appointments,  news  and 
mail. 


4.2*  Work  Trays 

A work  tray  is  a new  mechanism  which  has  been  introduced  in  order  to  manage  and  record  the  allo- 
cation, progress,  and  completion  of  work  within  a software  development  project.  Each  user  may  have  a 
number  of  work  trays,  each  of  which  may  contain  a number  of  tasks  that  contain  software  products . 
Products  are  stored  as  entities  within  the  ENCOMPASS  configuration  management  system.  There  are 
three  types  of  trays:  input  trays,  in-progress  trays,  and  file  trays . Each  user  receives  tasks  in  one  or 
more  input  trays.  The  user  may  then  transfer  these  tasks  to  an  in-progress  tray  where  he  will  perform 
the  actions  required  of  him  and  produce  new  products.  The  user  may  then  return  the  task  via  a concep- 
tual output  tray  to  an  input  tray  for  the  originator  of  the  task.  A user  may  also  create  new  tasks  in  in- 
progress trays  that  he  owns.  These  tasks  may  then  be  transferred  to  another  user's  input  tray.  A task 
that  has  been  transferred  back  into  the  in-progress  tray  of  the  user  who  created  the  task  may  be  marked 
as  complete  and  transferred  to  a file  tray  for  long  term  storage. 

Each  task  has  a home , which  is  the  tray  where  the  task  was  created,  a location,  which  is  the  tray 
where  the  task  currently  resides,  and  an  attribute  time , which  is  the  time  the  last  action  involving  that 
task  took  place.  Status  commands  allow  examination  of  the  tasks  in  a tray  and  the  products  in  a task. 

4.3*  Implementation  of  the  Change  Control  Scheme 

User  change  requests  can  be  generated  because  of  bug  reports  or  user  requests  for  enhanced  func- 
tionality. These  are  sent  to  the  change  control  system  by  electronic  mail  and  are  stored  in  a notesfile 
“User  Change  Requests”. 

A user  change  request  is  a form  that  can  be  filled  in  manually  using  an  editor  tailored  for  form 
filling  or  can  be  generated  by  software  error  reporting  tools.  It  is  entered  into  the  notesfile  mail  system 
by  standard  mailing  utilities.  In  this  way,  user  change  requests  can  be  generated  from  a wide  range  of 
sources,  some  local  and  some  remote. 

The  User  Change  Requests  notesfile  is  the  receiving  station  for  all  requests  to  change  the  software. 
The  Change  Control  Board  manager  creates  a particular  “Program  Modification”  task  in  an  in-progress 
tray.  In  addition  to  the  details  extracted  from  the  notesfile,  the  manager  may  also  add  the  amount  of 
time  within  and  the  urgency  with  which  a response  to  the  request  should  be  created.  The  manager 
transfers  the  task  to  the  “Program  Modification  Request”  input  tray  of  an  analyst,  see  Figure  4.  The 
analyst  will  transfer  the  request  to  a in-progress  tray  in  order  to  respond  to  the  request.  The  analyst 
may  create  a product  called  an  “Invalid  Request”  report  as  a result  of  his  analysis  if  he  believes  that 
such  a report  is  appropriate.  Alternatively,  the  analyst  may  create  a detailed  description  of  the  steps 
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needed  to  implement  the  change  or  bug  fix.  The  analyst  transfers  the  task  with  the  analysis  of  the 
request  back  to  the  manager’s  “Program  Modification  Plan”  input  tray.  Should  the  analyst  not  respond 
to  the  request  within  a reasonable  time,  the  periodic  invocation  of  consistency  checking  programs  can 
automatically  detect  the  delay  and  enter  a complaint  in  the  “Problem  Tracking  Management”  notesfile 
(which  is  not  shown)  and  flag  the  Program  Modification  task  with  an  item  that  documents  the  warning. 

The  manager  may  transfer  the  task  back  into  his  in-progress  tray.  Depending  upon  the  products 
produced  by  the  analyst,  he  may  register  the  task  as  completed,  transfer  it  to  a file  tray  and  write  a 
response  to  the  request  in  the  notesfile  that  further  action  is  unnecessary,  convene  the  change  control 
board,  or  reject  the  plan  and  reassign  the  task  to  the  analyst  with  recommendations  for  a revised  plan  or 
to  reject  the  request. 

Should  the  manager  wish  to  review  the  plan,  the  Change  Control  Board  will  be  convened  to  discuss 
the  Program  Modification  Plans.  Alternatively,  the  Board  may  discuss  the  Plans  electronically  through 
the  notesfile  system.  Given  acceptance  of  a plan,  the  manager  of  the  problem  tracking  system  checks 
out  the  products  that  are  needed  to  make  the  modification  from  the  project  library  and  enters  them  into 
the  task.  He  then  transfers  the  task  to  the  “Job  Specification”  input  tray  of  a programmer. 

The  programmer  receives  the  task  and  transfers  it  into  an  in-progress  tray.  The  programmer  will 
add  and  modify  code,  documentation,  test  cases,  and  proofs  of  correctness  to  the  products  of  the  task. 
When  complete,  the  programmer  will  transfer  the  task  to  the  “Software  Modification  Summary”  input 
tray  of  the  manager. 

When  a Software  Modification  Summary  is  received,  the  manager  will  again  convene  the  Change 
Control  Board.  If  the  review  is  satisfactory,  he  will  check  the  new  product  into  the  project  library  as  a 
new  version  of  the  software  and  announce  the  release  of  the  software  through  the  “Software  Release” 
notesfile.  If  the  review  is  unsatisfactory,  he  may  create  a new  Job  Specification. 

At  any  time,  the  manager  or  programmers  may  query  any  of  the  tasks  they  have  been  assigned  or 
have  created.  Acceptance  criteria  may  be  in  the  form  of  executable  procedures  which  produce  reports 
(for  example,  executable  acceptance  tests),  records  of  compilations  or  examinations  of  the  file  activity  of 
program  files.  These  acceptance  criteria  may  be  automatically  stored  as  products  of  the  task.  Status 
commands  will  summarise  such  records,  report  on  who  is  currently  working  on  the  task,  who  is  waiting 
for  completion  of  the  task,  and  what  other  tasks  are  needed  to  be  completed  before  the  current  task  can 
be  completed. 

Thus,  very  simple  mechanisms  can  be  used  to  automate  management,  provided  that  the  objectives 
being  managed  are  well-defined.  In  the  example  given,  the  problem  and  the  resulting  corrective  mainte- 
nance need  to  be  well-defined.  In  addition,  the  corrective  maintenance  must  be  validated.  A feasibility 
study  of  the  work  tray  concept  has  been  completed  and  the  concept  is  being  extended.  In  the  following 
section,  we  discuss  the  interaction  between  maintenance  and  the  configuration  management  system. 

5.  Configuration  Management  System 

The  configuration  management  system  is  responsible  for  maintaining  the  consistency  of,  integrity 
of  and  relationships  between  the  products  of  software  development.  In  the  SAGA  project,  Terwilliger 
and  Campbell  (28)  model  software  configurations  using  a graph  in  which  the  nodes  represent  uniquely 
named  entities  or  uniquely  named  collections  of  entities  and  the  arcs  represent  relationships  between 
entities.  Layers  within  the  graph  represent  different  abstract  properties  of  the  software  products.  The 
graph  also  represents  the  organization  of  the  software  products  into  separate  concerns. 

In  ENCOMPASS,  software  configurations  can  be  decomposed  by  organizational  relationships  into 
vertical  and  horizontal  structures.  The  vertical  structures  form  a hierarchy  and  decompose  the  system 
into  independent  components.  For  example,  within  a software  development  project,  the  configuration 
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may  be  structured  into  subsystems.  These,  in  turn,  are  decomposed  into  modules  which  are  decomposed 
into  compilation  units . 

The  horizontal  structures  represent  dependencies  between  entities  at  the  same  hierarchical  level. 
Thus,  each  project,  subsystem,  module,  and  unit  may  have  a horizontal  structure  which  includes  depen- 
dencies between  documents,  version  information,  requirements  and  system  specification,  shared 
definitions,  architectural  design,  detailed  design,  code,  binaries,  linked  binaries,  test  cases,  procedures  for 
generating  executable  binaries,  listings,  reports,  authors,  managers,  time  and  tool  certification  stamps, 
development  histories,  and  concurrency  control  locks.  Relationships  may  specify  design,  compilation  and 
version  dependencies.  Depending  upon  the  granularity  of  the  entities,  the  graph  can  be  represented  by 
the  UNIX  directory  structure,  by  symbolic  links,  or  by  databases.  For  example,  in  ENCOMPASS  the 
vertical  structure  is  stored  using  the  UNIX  directory  structure.  Shared  definitions  are  represented  by 
symbolic  links.  A database  at  each  level  in  the  vertical  structure  is  being  built  to  provide  data  diction- 
ary capabilities  and  author  manager  relations. 

Abstractions  of  the  collection  of  software  products  are  provided  by  views,  A view  represents  a par- 
ticular abstract  property  or  concern  and  is  implemented  as  a mapping  from  names  into  products.  The 
“base  view”  is  a complete  collection  of  the  software  products.  For  example,  a “functional  test”  view 
might  represent  the  system  as  a collection  of  functional  specifications,  object  code,  test  programs  and 
test  data.  Other  examples  of  views  include  a single  version  abstraction  of  a system  that  has  many  con- 
current versions,  documentation,  and  the  work  of  a particular  developer. 

Continuing  our  discussion  of  change  control  within  maintenance,  we  consider  the  problems  arising 
in  modifying  an  existing  program.  Figure  5 shows  an  example  tree  traversal  program  stored  in  an 
ENCOMPASS  configuration  management  system  (Kirslis  et  al  (19)).  Not  all  the  dependencies  and  details 
are  shown.  The  program  is  presented  as  a subsystem  containing  four  modules,  preorder,  stack,  tree,  and 
item.  Each  module  contains  entities  including  a makefile  (Feldman  (13)),  specification,  body  or  source 
code,  compiled  object  code,  and  executable  program.  Only  one  type  of  relationship  is  shown,  the  uses 
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Figure  5:  Base  view  for  the  preorder  program 
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relationship,  which  associates  an  entity  with  another  entity  if  tlie  former  entity  references  the  latter  one. 
Each  “uses”  relationship  should  be  accompanied  by  a “used  by”  relationship,  not  shown  in  the  figure, 
which  is  simply  the  inverse  of  of  the  “uses”  relationship,  and  which  permits  the  references  to  a 
module/entity  to  be  determined  from  that  module/entity.  Each  body  within  a module  references  its 
own  specification.  The  body  of  preorder  references  the  specifications  in  the  other  modules.  The  makefile 
for  each  module  references  the  specification  and  body  to  be  compiled,  and  the  compiled  object  which  will 
be  produced.  In  addition,  the  makefile  in  the  preorder  module  also  references  the  makefiles  and  objects 
in  the  other  modules,  since  it  needs  these  in  order  to  produce  an  executable  program. 

A number  of  benefits  are  realized  if  this  dependency  graph  is  stored  in  machine  accessible  form  and 
if  the  software  tools  in  use  are  adapted  to  refer  to  the  graph.  A data  retrieval  tool  can  provide  informa- 
tion about  the  hierarchical  structure  of  the  program.  For  a given  module,  the  tool  can  show  its  depen- 
dencies with  respect  to  other  modules. 

An  editor,  adapted  to  use  this  graph,  can  permit  a programmer  to  specify  a routine,  module,  or 
program  to  edit.  If  the  programmer  specifies  a module,  that  module  becomes  the  locus  at  the  beginning 
of  the  editing  session.  The  programmer  edits  within  the  context  of  that  layer  of  abstraction.  Only  the 
local  context  of  the  module  is  important.  The  programmer  can  find  and  display  other  modules,  routines, 
or  programs  which  use  this  module.  These  references  may  be  checked  easily  to  determine  how  a change 
in  the  current  module  will  affect  them.  Similarly,  other  modules  that  are  referenced  by  the  modules 
which  reference  the  module  under  consideration  may  be  located  easily  and  displayed. 

Compilation  tools,  which  access  the  dependency  graph,  can  support  automatic,  incremental  recom- 
pilation on  a module  by  module  basis.  For  example  since  the  body  of  preoYder  depends  on  the 
specification  for  stack,  if  the  specification  for  stack  has  been  changed  since  the  time  preorder  was  last 
compiled,  then  preorder, ; will  be  recompiled.  A compilation  tool  can  use  the  dependency  graph  to  resolve 
the  dependencies  at  compile  time  and  access  all  files  needed  to  perform  a compilation. 

Versions  of  the  preorder  program  are  stored  in  a program  library  (28).  Conceptually,  these  ver- 
sions appear  in  the  library  as  independent  entities  as  depicted  in  Figure  8.  The  versions  are,  however, 
interdependent  because  of  the  history  of  their  construction.  The  versions  are  constructed  from  revision 


• • • 


Global  Library 


Figure  6:  Global  library  containing  versions  of  preorder  base  view 
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control  histories  of  preorder.  Each  preorder  version  is  a collection  of  versions  of  the  other  modules. 
Each  version  of  a module  is  stored  using  a history  mechanism  based  on  the  Revision  Control  System 
(RCS)  of  Tichy  (29).  Similarly,  the  information  describing  a version  of  preorder  is  also  stored  under 
RCS.  Library  makefiles  construct  a specific  version  of  preorder  within  the  Library  on  a demand  or  check 
out  basis.  The  specific  version  of  preorder  specifies  the  versions  of  each  module  that  are  needed  to  be 
extracted.  The  dependencies  between  modules  and  within  modules  are  recorded  in  a format  that  can  be 
stored  within  RCS.  (In  our  prototype  ENCOMPA6S  environment,  these  dependencies  are  recorded  using 
the  UNIX  tape  archiving  facility  tar  and  placed  directly  under  RCS.) 

To  modify  preorder,  a read-only  copy  of  the  latest  version  of  preorder  is  checked  out.  This  version 
is  still  under  configuration  management  and  resides  within  the  protection  provided  by  the  global  library. 
Figure  7 shows  how  a view  of  preorder  is  constructed  in  a workspace.  The  workspace  facilitates  chang- 
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Figure  7:  Workspace  containing  view  of  preorder 
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ing  preorder.  Each  entity  within  preorder  can  be  accessed  read-only  through  this  view.  In  the  ENCOM- 
PASS prototype,  the  view  is  implemented  as  a hierarchical  directory  structure  which  initially  only  con- 
tains symbolic  links  to  the  base  view  stored  in  the  library. 

In  order  to  modify  components  of  preorder,  the  entities  concerned  are  checked  into  the  workspace. 
In  terms  of  implementation,  the  symbolic  links  are  replaced  by  copies  of  the  actual  entities  to  which  they 
correspond.  Figure  8 shows  a new  version  of  preorder  being  developed  in  which  two  entities  within 
module  item  are  being  modified.  If  the  new  version  being  developed  is  a sequential  revision  of  preorder, 
locks  are  placed  within  the  library  on  those  modules  checked  into  the  workspace.  These  locks  prevent 
any  parallel  development  of  the  same  entities.  The  next  version  number  of  preorder  and  the  modules 
concerned  are  assigned.  If  the  new  version  is  instead  a parallel  revision  of  preorder,  locks  are  not 
imposed  but  parallel  revision  version  numbers  for  preorder  and  the  modules  concerned  are  assigned. 
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Figure  8:  Workspace  containing  new  version  of  preorder 


Figure  9:  New  version  of  preorder  installed  in  global  library 

Once  the  development  and  testing  of  a new  version  is  complete,  the  programmer  submits  a sum* 
mary  of  the  modifications  to  the  change  control  board.  The  change  control  board  evaluates  the 
modifications  and  makes  a recommendation  as  to  whether  the  work  constitutes  a valid  version.  (In  a 
more  complex  change  control  system,  the  evaluation  of  the  new  software  might  be  performed  by  a qual- 
ity assurance  group.  Our  management  model  and  implementation  are  easy  to  extend  to  permit  such  a 
system.)  Following  a software  release,  the  new  version  is  integrated  into  the  library  system,  as  shown  in 
Figure  9,  and  the  RCS  files  of  the  individual  modules  that  are  altered  are  updated. 

6.  Summary 

This  paper  describes  a prototype  management  system  that  has  been  constructed  on  UNIX  as  part 
of  the  ENCOMPASS  environment.  The  example  change  control  system  described  has  been  built  using 
the  system.  The  prototype  system  demonstrates  the  feasibility  of  the  approach,  but  further  research  and 
refinement  are  required  to  develop  a practical  management  system. 

The  prototype  implementation  is  not  robust  and  offers  no  protection  from  misuse.  A complete  log 
of  the  actions  performed  on  the  tasks  should  be  kept  in  a secure  location  to  support  auditing.  Further, 
the  implementation  has  limited  goals  and  is  not  fully  integrated  into  the  SAGA  set  of  tools  and  the 
configuration  system.  The  system  permits  a task  to  be  decomposed  into  subtasks  but  should  maintain 
records  of  those  relationships.  Finally,  the  system  ought  to  be  coupled  to  management  tools  such  as 
report  generators,  Pert  chart  analyzers  and  flow  charting  displays. 

However,  the  approach  is  simple  and  provides  a framework  for  building  automated  management. 
We  believe  our  approach  can  be  refined  into  a production  quality  system  for  managing  software  projects. 
We  shall  be  exploring  refinements  of  our  approach  to  accomplish  this  end. 
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Abstract.  Large  scale  software  development  is  so  expensive  that  new  tech- 
niques and  methods  are  required  to  improve  productivity.  The  software 
development  environment  is  a proposed  solution  in  which  software 
development  methods  and  paradigms  are  embedded  within  a computer 
software  system.  The  goal  of  an  environment  is  to  provide  software 
developers  with  a computer-aided  specification,  design,  coding,  testing 
and  maintenance  system  that  operates  at  the  level  of  abstraction  of  the 
software  development  process  and  the  application  domains  of  its  intended 
products. 

Proposed  software  development  environments  range  from  simple  collec- 
tions of  software  tools  that  enhance  the  development  process  to  complex 
systems  that  support  sophisticated  software  production  methods.  Every 
environment  must  include  a representation  for  the  eventual  software  pro- 
ducts and  a,  perhaps  informal,  notion  of  the  software  development  pro- 
cess. In  the  SAGA  project,  we  have  been  investigating  the  principles  and 
practices  underlying  the  construction  of  a software  development  environ- 
ment. In  this  paper,  we  review  our  studies  and  results  and  discuss  the  is- 
sues of  providing  practical  environments  in  the  short  and  long  term. 

1.  Introduction 

Research  into  software  development  is  required  to  reduce  the  cost  of  produc- 
ing software  and  to  improve  software  quality.  Modern  software  systems,  such  as 
the  embedded  software  required  for  NASA’s  space  station  initiative,  stretch 
current  software  engineering  techniques.  The  requirements  to  build  large,  reli- 
able, and  maintainable  software  systems  increases  with  time.  Much  theoretical 
and  practical  research  is  in  progress  to  improve  software  engineering  techniques. 
One  such  technique  is  to  build  a software  system  or  environment  which  directly 
supports  the  software  engineering  process.  In  this  paper,  we  will  describe  research 
in  the  SAGA  project  to  design  and  build  a software  development  environment 
which  automates  the  software  engineering  process. 

The  design  of  a computer-aided  software  development  environment  should 
be  guided  by  the  problems  that  arise  in  manual  software  development  methods. 
Many  of  these  problems  are  reflected  in  software  cost  estimation  models  and 


measurements  (Boehm  (4)).  A major  proportion  of  the  cost  of  a software  system 
is  in  its  maintenance  (60%),  and  testing  (20%).  Fairley  (13)  comments  that 
software  costs  are  very  sensitive  to  mistakes  in  the  early  requirements  and  design 
phases  of  development.  Sackman  et  al  (37)  and  Myers  (32)  have  demonstrated 
that  programmers  and  program  testers  vary  greatly  in  the  productivity  and  qual- 
ity of  their  work.  However,  high-level  languages  and  software  tools  to  support 
development  may  increase  the  productivity  of  a programmer  by  as  much  as  222% 
(4).  Orders  of  magnitude  improvement  in  the  productivity  of  software  engineers 
might  be  achieved  in  many  application  areas  if  the  products  of  software  engineer- 
ing can  become  reusable,  that  is,  if  the  requirements,  design,  documentation,  vali- 
dation, and  verification  of  a software  system  can  be  reused  in  maintenance  and  in 
building  new  systems. 

The  SAGA  project  is  investigating  the  design  and  construction  of  practical 
software  engineering  environments  for  developing  and  maintaining  aerospace  sys- 
tems and  applications  software  (Campbell  and  Kirslis  (8)).  The  research  includes 
the  practical  organization  of  the  software  lifecycle,  configuration  management, 
software  requirements  specification,  executable  specifications,  design  methodolo- 
gies, programming,  verification,  validation  and  testing,  version  control,  mainte- 
nance, the  reuse  of  software,  software  libraries,  documentation  and  automated 
management.  The  research  is  documented  in  the  mid-year  report  (Campbell  et  al 
(10)).  An  overview  of  the  SAGA  project  components  is  shown  in  Fig.  1. 

In  this  paper,  we  will  argue  for  research  into  formal  models  of  the  software 
development  process.  Such  formal  models  should  aid  experimental  evaluation  of 
the  practical  techniques  that  are  used  in  the  construction  of  software  development 
environments.  The  SAGA  project  is  developing  models  of  configuration,  design, 
incremental  development,  and  management.  The  concepts  and  tools  resulting 
from  SAGA  are  being  used  to  develop  a prototype  software  development  system 
called  ENCOMPASS  (Terwilliger  and  Campbell,  (41)).  Although  the  research  has 
developed  many  general  tools  and  concepts  that  are  independent  of  the  applica- 
tion language  and  domain,  we  hope  to  extend  ENCOMPASS  to  support  the 
development  of  large,  embedded  software  systems  written  mainly  in  ADA. 

2.  The  Requirements  of  a Software  Engineering  Environment 

Practical  software  development  environments  will  be  used  by  software 
developers  and  software  managers  with  several  years  experience  in  software 
development.  Although  some  components  of  the  system  may  be  used  as  educa- 
tional tools,  this  is  not  a major  goal.  The  requirements  for  a practical  software 
development  environment  can  be  structured  into  three  components: 

1.  the  organization  and  representation  of  software  products  produced  by  the 

development  process  (the  configuration  management  system,) 

2.  the  software  development  processes  (the  lifecycle  model,  software  develop- 
ment, management,  and  methodologies,) 

3.  the  tools  by  which  software  development  processes  interface  to,  name,  and 

manipulate  software  products. 


Fig.  1 THe  SAGA  Workbench  Components 

Guiding  the  selection  of  requirements  for  each  of  these  components,  we  pro- 
pose the  following  principles: 

1.  A formal  basis  should  be  provided  for  the  software  environment  and  its  com- 
ponents. This  basis  should  serve  to  validate  the  software  development  para- 
digms and  methodologies  used  in  the  environment  and  also  verify  the  correct 
operation  of  the  components.  The  formal  basis  should  allow  the  specification 
of  such  concepts  as  the  model  of  the  software  lifecycle  in  use,  the  design 
methodologies,  maintenance  methods  as  well  as  the  dependency  relationships 
between  products  of  software  development  (including  requirements 
specifications,  design,  tests,  documentation,  problem  tracking,  as  well  as  code 
and  versions.) 

2.  Management  by  objectives.  Each  software  engineering  task  should  have 
well-defined  goals,  participants,  and  managers.  The  developers  should  be 
able  to  interact  with  their  managers  in  refining  these  goals  (Gunther  (17)). 
The  task  should  produce  clearly  identified  software  products  which  may  be 
validated  or  verified  with  respect  to  the  goals  of  the  task  (Lehman  et  al  (30)) 
and  a method  of  certifying  that  the  validation  or  verification  has  occurred. 


3.  Automated  management  aids  should  provide  a project  manager  with  tools 
which  summarize  project  activity  and  progress.  A project  manager  should 
be  allowed  to  review  the  progress  of  the  project  in  detail  or  in  summary  at 
any  time. 

4.  Automated  development  tools  should  actively  support  software  development 
and  enhance  the  software  developer’s  abilities.  Campbell  and  Kirslis  (8) 
argue  that  a software  developer  must  be  convinced  that  a task  can  be  better, 
performed  using  a tool  than  without  it,  irrespective  of  what  other  services 
the  tool  might  provide. 

5.  Automated  quality  control  tools  should  permit  inspections  and  audits  of  the 
derivation  of  any  software  product.  This  should  include  examination  of  any 
certification  process,  audits  of  the  software  development  process,  and  ana- 
lyses of  the  project  management.  Tools  should  also  support  the  verification 
that  a software  product  or  development  process  meets  appropriate  accep- 
tance criteria  and  that  the  configuration  management  system  is  kept  con- 
sistent and  up  to  date. 

Many  of  the  principles  require  further  research.  In  the  following  sections,  we 
discuss  the  state  of  our  current  research  in  applying  these  principles  to  the  con- 
struction of  software  systems. 

3.  Configuration  Management  System 

The  configuration  management  system  is  responsible  for  maintaining  the 
consistency  of,  integrity  of  and  relationships  between  the  products  of  software 
development.  In  the  SAGA  project,  Terwilliger  and  Campbell  (41)  model  the 
configuration  management  system  using  a graph  in  which  the  nodes  represent 
uniquely  named  entities  or  uniquely  named  collections  of  entities  and  the  arcs 
represent  relationships  between  entities.  Layers  within  the  graph  represent 
different  abstract  properties  of  the  software  products.  The  graph  also  represents 
the  organization  of  the  software  products  into  separate  concerns. 

The  configuration  system  for  ENCOMPASS  can  be  decomposed  by  organiza- 
tional relationships  into  vertical  and  horizontal  structures.  The  vertical  struc- 
tures form  a hierarchy.  For  example,  within  a software  development  project,  the 
configuration  may  be  structured  into  subsystems.  These,  in  turn,  are  decomposed 
into  modules  which  are  decomposed  into  compilation  units. 

The  horizontal  structures  represent  attributes  of  the  hierarchy.  Thus,  each 
project,  subsystem,  module,  and  unit  may  have  an  attribute  for  documentation, 
version  information,  requirements  specification,  shared  definitions,  architectural 
design,  detailed  design,  code,  binaries,  linked  binaries,  test  cases,  procedures  for 
generating  executable  binaries,  listings,  reports,  authors,  managers,  time  and  tool 
certification  stamps,  development  histories,  and  concurrency  control  locks. 
Interattribute  relationships  specify  design,  compilation  and  version  dependencies. 
Depending  upon  the  granularity  of  the  entities,  the  graph  can  be  represented  by 
the  UNIX  directory  structure,  by  symbolic  links,  or  by  databases.  For  example, 
in  ENCOMPASS  the  vertical  structure  is  stored  using  the  UNIX  directory  struc- 
ture. Shared  definitions  are  represented  by  symbolic  links.  A database  at  each 


level  in  the  vertical  structure  is  being  built  to  provide  data  dictionary  capabilities 
and  author  manager  relations. 

Abstractions  of  the  collection  of  software  products  are  provided  by  views. 
The  “base  view”  is  a complete  collection  of  the  software  products  and  other 
views.  A “view”  is  a layer  in  the  graph  which  represents  a particular  abstract 
property  or  concern.  For  example,  a “functional  test”  view  might  represent  the 
system  as  a collection  of  functional  specifications,  object  code,  test  programs  and 
test  data.  Other  examples  of  views  include  a single  version  abstraction  of  a sys- 
tem that  has  many  concurrent  versions,  documentation,  and  the  work  of  a partic- 
ular developer. 

Fig.  3 shows  an  example  tree  traversal  program  stored  in  an  ENCOMPASS 
configuration  management  system  (Kirslis  et  al  (26)).  It  shows  a base  view,  which 
includes  all  the  details  of  the  software,  and  a test  view,  which  is  a projection  onto 
the  base  view  that  abstracts  some  of  the  details  of  the  base  view  and  supports  the 
testing  of  the  software.  Not  all  the  dependencies  and  details  are  shown.  The  pro- 
gram is  presented  as  a subsystem  containing  four  modules,  preorder,  stack,  tree, 
and  item.  Each  module  contains  entities  including  a makefile  (Feldman  (14)), 
specification,  body  or  source  code,  compiled  object  code,  executable  program,  test 
specifications,  test  body,  test  makefile,  compiled  test  object,  executable  test 


Entitie$:  0 specification  0 body  0 makefile  0 compiled  object  0 executable  program 

0 test  specification  (Si ) test  body  (0)  test  makefile  0 compiled  test  object 

0) executable  test  program  0)test  data 
Relationi:  — ^ uses  --s»  projects  onto 


Fig.  3 A view  of  the  preorder  program 


program,  and  test  data.  Only  one  type  of  relationship  is  shown,  the  uses  relation- 
ship, which  associates  an  entity  with  another  entity  if  the  former  entity  references 
the  latter  one.  Each  “uses”  relationship  should  be  accompanied  by  a “used  by” 
relationship,  not  shown  in  the  figure,  which  is  simply  the  inverse  of  of  the  “uses” 
relationship,  and  which  permits  the  references  to  a module/entity  to  be  deter- 
mined from  that  module/entity.  Each  body  within  a module  references  its  own 
specification.  The  body  of  preorder  references  the  specifications  in  the  other 
modules.  The  makefile  for  each  module  references  the  specification  and  body  to 
be  compiled,  and  the  compiled  object  which  will  be  produced.  In  addition,  the 
makefile  in  the  preorder  module  also  references  the  makefiles  and  objects  in  the 
other  modules,  since  it  needs  these  in  order  to  produce  an  executable  program. 

A number  of  benefits  are  realized  if  this  dependency  graph  is  stored  in 
machine  accessible  form  and  if  the  software  tools  in  use  are  adapted  to  refer  to 
the  graph.  A data  retrieval  tool  can  provide  information  about  the  hierarchical 
structure  of  the  program.  For  a given  module,  the  tool  can  show  its  dependencies 
with  respect  to  other  modules. 

An  editor,  adapted  to  use  this  graph,  can  permit  a programmer  to  specify  a 
routine,  module,  or  program  to  edit.  If  the  programmer  specifies  a module,  that 
module  becomes  the  locus  at  the  beginning  of  the  editing  session.  The  program- 
mer edits  within  the  context  of  that  layer  of  abstraction.  Only  the  implementa- 
tion details  of  the  module  are  important.  The  programmer  can  find  and  display 
other  modules,  routines,  or  programs  which  use  this  module.  These  references 
may  be  checked  easily  to  determine  how  a change  in  the  current  module  will 
affect  them.  Similarly,  other  modules  which  this  module  references  may  be 
located  easily  and  displayed. 

Compilation  tools,  which  access  the  dependency  graph,  can  support 
automatic,  incremental  recompilation  on  a module  by  module  basis.  For  example 
since  the  body  of  preorder  depends  on  the  specification  for  stack,  if  the 
specification  for  stack  has  been  changed  since  the  time  preorder  was  last  compiled, 
then  preorder  will  be  recompiled.  A compilation  tool  can  use  the  dependency 
graph  to  resolve  the  dependencies  at  compile  time  and  access  all  files  needed  to 
perform  a compilation1. 

Test  tools  can  use  the  dependency  graph  to  provide  incremental,  hierarchical 
testing  for  modular  programs.  A test  suite  and  driver  may  be  associated  with 
each  module.  A program  can  then  be  incrementally  tested  in  a bottom  up 
manner,  that  is,  all  modules  referenced  by  module  A will  be  tested  before  module 
A is  tested.  If  any  of  the  referenced  modules  fail  their  tests  then  the  system  can 
print  an  appropriate  message  and  terminate  the  testing  session.  If  the  test  driver, 
test  suite,  or  module  has  not  been  changed  since  the  tests  were  last  run,  the  sys- 
tem can  report  the  previous  results  without  rerunning  the  tests. 

1In  practice,  by  using  UNIX  we  can  do  better  than  this.  By  an  appropriate  implementation 
of  the  source  dependency  information,  we  can  make  it  appear  as  though  all  files  needed  for  a com- 
pilation are  resident  in  one  place,  permitting  us  to  use  an  existing  makefile  interpreter  program 
and  compiler  without  modification  (26). 


Fig.  3 includes  a test  view  which  might  be  used  by  a quality  assurance  team 
to  test  preorder  after  it  has  been  completed.  The  test  view  contains  a module 
corresponding  to  each  code  module  in  the  base  view.  The  dashed  arrows 
represent  the  projection  relationship  which  shows  the  correspondence  between 
entities  in  the  test  and  base  view.  Each  projection  relationship  is  accompanied  by 
an  abstraction  relationship,  not  shown  in  the  figure,  which  is  its  inverse.  Each 
module  in  the  test  view  contains  the  specification  of  the  code  module  to  be  tested 
as  well  as  the  makefile,  load  module,  and  test  data  from  the  corresponding  test 
module  in  the  base  view. 

4.  Software  Development  Processes 

Fairley  (13)  describes  a life-cycle  model  as  the  sequence  of  distinct  stages 
through  which  a software  product  passes  during  its  lifetime.  There  is  no  single, 
universally  accepted  model  of  the  software  life-cycle  according  to  Blum  (3)  and 
Zave  (44).  In  SAGA,  we  have  investigated  several  aspects  of  the  software  life- 
cycle. 


4.1.  Software  Design  Model 

In  many  models  of  the  life-cycle,  a requirements  specification  of  the  system 
to  be  built  is  created  early  in  the  lifecycle.  As  the  project  proceeds,  components 
of  the  software  system  are  built  and  verified  for  correctness  with  respect  to  this 
specification.  The  specification  is  validated  when  it  is  shown  to  satisfy  the  custo- 
mers requirements.  To  help  manage  the  complexity  of  software  design  and 
development,  methodologies  which  combine  standard  representations,  intellectual 
disciplines,  and  well-defined  techniques  have  been  proposed  (Jackson  (20),  Wirth 
(42),  and  Yourdon  (43)).  In  the  SAGA  project,  we  are  developing  a formal  model 
for  the  development  process  and  using  it  to  study  a methodology  similar  to  the 
Vienna  Development  method  described  by  Jones  (21). 

A document  describing  the  function  of  a software  system  is  called  a func- 
tional specification  (13).  Design  introduces  the  algorithms  and  data  structures  to 
implement  a functional  specification.  In  this  paper,  we  will  argue  that  there  are 
three  separate  fundamental  issues  involved  in  developing  computer-based 
software  design  aids.  We  will  assume  that  the  development  process  consists  of  a 
number  of  refinement  steps.  The  first  concern  is  the  design  decision  to  select  one 
refinement  step  instead  of  another.  Design  decisions  are  difficult  to  formalize 
without  a better  understanding  of  the  development  process  and  the  application 
domain. 

The  second  concern  is  the  documentation  and  verification  of  a refinement 
step  or  implementation  decision.  Several  researchers  have  argued  the  need  for 
rigorous  argument  or  formal  verification  of  a refinement  step  using  proof  methods 
(21).  The  refinement  step  can  be  regarded  as  a correctness  preserving  transforma- 
tion from  an  abstract  program  to  a more  concrete  program.  Using  such  an 
approach,  the  verification  becomes  a record  of  the  refinement  steps. 

The  third  concern  is  the  development  process.  We  argue  that  a model  for 
the  development  process  is  required  in  order  to  reason  about  different  develop- 
ment methodologies  and  the  different  methods  of  verifying  refinement  steps. 


In  our  model  of  a development  process,  a functional  specification  defines  a 
potentially  infinite  number  of  implementations.  The  development  process  selects 
a single  implementation  from  this  large  set.  Each  refinement  step  produces  a 
derived  functional  specification  or  “abstract  program’’  which  constrains  the 
number  of  possible  implementations.  The  purpose  of  the  model  is  to  allow  a 
study  of  incremental  program  development.  Within  the  framework  provided  by 
the  model  we  can  compare  different  development  methodologies  and  investigate 
subtle  problems  in  a rigorous  manner.  By  separating  the  development  process 
from  the  issues  involved  in  performing  a refinement  step,  our  approach  provides  a 
framework  to  build  tools  that  support  a general  notion  of  a development  process 
and  that  are  independent  from  particular  design  methodologies.  We  hope  that 
the  model  can  also  help  justify  design  rules  which  permit  rigorous,  but  not  for- 
mal, arguments  of  correctness  by  construction. 

4.2.  Executable  Specifications 

A major  problem  arising  in  the  design  of  software  is  the  accurate  determina- 
tion of  the  function  that  the  software  is  to  perform.  The  users  of  the  system 
being  constructed  may  not  really  know  what  they  want  and  they  may  be  unable 
to  communicate  their  desires  to  the  development  team.  If  a functional 
specification  is  in  a formal  notation,  it  may  be  an  ineffective  medium  for  com- 
munication with  the  customers,  but  natural  language  specifications  are  notori- 
ously ambiguous  and  incomplete. 

Functional  specifications  may  be  introduced  as  part  of  the  design  process 
(perhaps  describing  the  elements  of  an  abstract  program)  and  should  help  docu- 
ment the  design  process  as  well  as  enhance  the  designer’s  understanding  of  the 
design.  If  a formal  notation  is  used  for  such  specifications,  a designer  may  not  be 
sufficiently  well-motivated  to  document  his  design  with  a specification  because  it 
does  not  directly  contribute  towards  the  act  of  creating  a program.  However,  a 
natural  language  specification  may  be  too  imprecise. 

Prototyping  (Kruchten  et  al  (28))  and  the  use  of  executable  specification 
languages  (Goguen  and  Meseguer  (16),  Kamin  et  al  (22),  Zave  (44),  (Kem- 
merer(23))  have  been  suggested  as  partial  solutions  to  these  problems.  Providing 
the  customers  with  prototypes  for  experimentation  and  evaluation  may  increase 
communication  between  customers  and  developers  and  enhance  the  validation 
process.  Executable  specifications  used  in  the  design  process  provide  stubs  that 
allow  experimental  evaluation  of  the  algorithms  and  data  structures  of  a program 
being  developed  without  requiring  the  program’s  completion. 

Terwilliger  and  Campbell  (41)  describe  the  design  of  an  executable 
specification  language  called  PLEASE  for  use  in  the  SAGA  Project.  By  providing 
executable  programs  early  in  the  development  process,  errors  in  the  specification 
may  be  discovered  before  the  internal  structure  of  the  system  has  been  defined. 
We  believe  that  this  approach  will  enhance  the  software  development  process.  A 
methodology  for  using  executable  specification  languages  in  the  software  lifecycle 
is  being  examined  as  part  of  ENCOMPASS  (41). 


4.3.  An  Executable  Specification  Design  Method 

ENCOMPASS  supports  program  development  by  successive  refinement  using 
a similar  approach  to  that  of  the  Vienna  Development  Method  (Jones  (21),  Shaw 
et  al  (39)).  In  this  method,  programs  are  first  specified  in  a language  combining 
elements  from  conventional  programming  languages  and  mathematics.  These 
abstract  programs  are  then  incrementally  refined  into  programs  in  an  implementa- 
tion language.  The  refinements  are  performed  one  at  a time  and  each  is  verified 
before  another  is  applied.  Therefore,  the  final  program  produced  by  the  develop- 
ment correctly  implements  the  original  abstract  program.  The  ENCOMPASS 
software  development  paradigm  is  shown  in  Fig.  4.1. 

Terwilliger  and  Campbell  (40)  describe  how  abstract  programs  may  be  writ- 
ten in  PLEASE  and  refined  into  the  implementation  language  Path  Pascal 
(Campbell  and  Kolstad  (9)).  In  PLEASE,  a procedure  or  function  may  be 
specified  with  pre-  and  post-conditions  written  in  predicate  logic.  Similarly,  an 
abstract  data  type  may  be  specified  using  an  invariant.  PLEASE  specifications 


Fig.  4-1  The  ENCOMPASS  Software  Development  Paradigm 


may  be  used  to  argue  correctness.  They  also  may  be  transformed  into  prototypes 
which  use  Prolog  (Clocksin  and  Mellish  (11))  to  “execute”  pre-  and  post- 
conditions. These  prototypes  may  interact  with  other  modules  written  in  conven- 
tional languages. 

Lehman  et  al  (30)  propose  that  software  development  may  be  viewed  as  a 
sequence  of  transformations  between  specifications  written  at  different  linguistic 
levels.  Neighbors  (33)  describes  the  construction  of  a system  that  supports  a simi- 
lar development  methodology.  ENCOMPASS  supports  this  view  of  software 
development  by  allowing  abstract,  predicate  logic  based  definitions  of  data  types 
or  routines  to  be  transformed  into  successively  more  concrete  realizations.  The 
use  of  executable  specifications  allows  prototypes  for  two  or  more  linguistic  levels 
to  be  executed  using  the  same  input  data  and  the  results  compared  for  the  pur- 
poses of  verification  or  debugging.  An  executable  specification  provides  a frame- 
work for  the  rigorous  development  of  programs  in  a manner  similar  to  (21). 
Although  detailed  formal  proofs  are  not  required  at  every  step,  the  framework  is 
present  so  that  they  may  be  constructed  if  necessary.  (However,  it  is  our  experi- 
ence that  many  problems  arise  in  changing  a rigorous  argument  into  a mathemat- 
ical proof.) 

Fig.  4.2  shows  an  example  of  a PLEASE  specification  for  a SORT  program. 
The  specification  is  given  in  terms  of  a pre-condition  and  post-condition  for  sort. 
Two  predicates,  “permutation”  and  “sorted”,  are  used  by  the  post-conditions. 
Terwilliger  and  Campbell  (40)  describe  the  translation  of  the  specification  into 
Prolog.  In  general,  the  translation  of  arbitrary  specifications  into  executable  pro- 
grams is  difficult.  Theoretically,  the  guaranteed  automatic  production  of  a ter- 
minating program  from  an  arbitrary  specification  written  in  first  order  logic  is  not 
possible.  One  aspect  of  our  future  research  will  be  to  study  what,  is  possible  in 
practice. 

The  specification  may  be  used  to  validate  the  user  requirements  for  sort  or 
they  may  be  used  as  a test  oracle  for  the  subsequent  refinements  of  sort  (40).  In 
addition,  using  rules  similar  to  those  provided  in  the  Vienna  Definition  Method 
(21),  an  argument  for  correctness  can  be  constructed  for  the  sort  program  based 
on  the  refinement  steps  used  to  build  the  program.  Examples  of  some  of  the  rules 
are  given  in  (40). 

4.4.  Software  Management  Model 

A management  model  for  software  development  must  identify,  control,  and 
record  the  development  process.  A management  model  can  be  based  on  a trace  of 
the  activities  within  the  project.  Such  a trace  can  be  used  to  understand  the 
meaning  of  management  in  a similar  manner  to  the  use  of  traces  in  defining  the 
meaning  of  a programming  language  (Campbell  and  Lauer  (6)).  The  trace 
represents  a complete  history  of  all  significant  events  that  have  occurred  in  the 
project.  Projections  from  the  trace  permit  identification  of  particular  sequences 
of  activities.  Control  can  be  expressed  in  terms  of  the  valid  continuations  of  a 
partially  completed  trace. 


program  sort  (input,  output ); 


#include  “integerjist.spec” 

var  inputjist,  outputjist:  integerjist; 

predicate  permutation  (listl,  list2:  integerjist); 
var  front,  back:  integerjist; 
begin 

(listl  = emptyjist)  and  (list2  = empty Jist) 

or 

(listl  = front  |!  <hd  (list2)  > ||  back)  and 
permutation  (front  ||  back,  tl(list2)) 

end; 

predicate  sorted  (1:  integerjist); 
var  x:  integer; 
begin 

(1  = emptyjist) 

or 

forall  (x  j member  (x,  tl(l)),  x > = hd(l))  and 
sorted(tl(l)) 

end; 

pre-condition; 

begin 

text_toJntegerJist(input)  < > integer  Jist  jerror 

end; 

post-condition; 

begin 

(inputjist  = text_toJntegerJist(  input))  and 
permutation(inputJist(  outputjist)  and 
sorted  (outputjist)  and 
(outputjist  = textjojntegerjist  (output')) 

end; 

begin 

end; 

Fig.  4.2  A specification  of  a sort  program. 


In  ENCOMPASS,  we  are  implementing  a limited  set  of  management  func- 
tions to  record,  monitor,  initiate  activities,  and  inhibit  inappropriate  activities. 
Instead  of  using  a detailed  trace  model  of  management,  we  have  adopted  a practi- 
cal approach  based  on  the  larger  granularity  provided  by  milestones.  We  struc- 
ture the  management  model  of  a software  project  into  units  of  work  which  create 
well-defined  products  (Gunther  (17)).  The  management  objectives  for  each 
activity  must  define  the  pre-conditions  under  which  the  activity  may  occur, 
acceptance  criteria  for  the  products  produced  by  the  activity,  and  a procedure  for 
evaluating  whether  the  acceptance  criteria  have  been  met.  The  acceptance  cri- 
teria evaluation  procedure  may  be  invoked  at  any  time  during  the  activity  and 
produces  status  reports  of  the  software  product.  Satisfaction  of  the  pre-condition 
and  the  acceptance  criteria  provide  “milestone”  events.  A record  of  the 
occurrence  of  these  milestones  is  stored  in  a management  log.  Accounting 


information  may  be  associated  with  each  unit  of  work.  The  log  and  accounting 
information  can  be  used  to  generate  reports  and,  when  used  with  other  informa- 
tion such  as  PERT  schedules,  to  control  the  project. 

Work  units  form  a hierarchical  structure.  The  reports  generated  by  one 
work  unit  may  satisfy  a pre-condition  or  acceptance  criteria  for  another  activity. 

In  ENCOMPASS,  management  monitoring,  assessment,  and  control  is  imple- 
mented using  makefiles,  predicate  evaluation,  and  Notesfiles.  Periodic  execution 
of  makefiles  are  used  to  implement  automated  management  and  assessment  of  the 
project.  The  makefiles  incorporate  automatic  evaluation  of  work  unit  pre- 
conditions, the  creation  of  work  units,  the  invocation  of  acceptance  criteria 
evaluation  procedures,  and  the  creation  of  milestones  when  a pre-condition  or 
acceptance  criteria  is  met.  The  Notesfiles  (Essick  (12))  record  milestones  and 
reports  and  propagate  traceable  management  information  to  developers  and 
managers. 

For  example,  consider  the  implementation  of  a problem  tracking  system. 
Bug  reports  are  mailed  to  the  “problem  definition”  notesfile.  They  can  be  created 
by  a user,  a developer,  or  by  the  execution  of  a program  at  a remote  or  local  site. 
Debugging  facilities  within  a software  product  can  automatically  report  an  inter- 
nal error  by  invoking  the  Notesfiles  mailer.  Similarly,  development  tools  may 
report  errors,  for  example  the  test  harness  may  automatically  report  the  detection 
of  an  error. 

The  problem  definition  notesfile  records  the  site,  author,  time,  address,  and 
complaint.  The  “problem  tracking  manager”  may  set  a timeout  on  the  notesfile 
sequencer  which  specifies  the  acceptable  interval  within  which  a “problem 
definition  analyst”  should  respond  to  the  note.  After  expiration  of  the  timeout, 
the  notesfile  automatically  notifies  the  manager  using  a “management”  notesfile. 

The  problem  definition  analyst  may  respond  to  the  note  in  several  ways.  A 
response  may  be  created  that  identifies  the  problem  as  a user  error.  Alterna- 
tively, the  analyst  may  create  a request  in  a maintenance  programmer’s 
“activity”  Notesfile  to  consider  possible  solutions  to  the  problem. 

The  acceptance  criteria  for  the  programmers  task  is  to  assess  the  practical 
design  issues  involved  in  correcting  the  problem,  provide  a cost  estimate  of  the 
work  involved,  and  produce  an  implementation  plan.  While  the  programmer  is 
considering  possible  solutions,  the  problem  definition  analyst  or  problem  tracking 
manager  may  request  progress  reports.  These  reports  may  consist  of  any  mile- 
stones accomplished  and  preliminary  documentation  generated. 

When  the  problem  definition  analyst  is  satisfied  that  the  acceptance  criteria 
for  the  task  have  been  satisfied,  he  may  then  submit  a change  request  note  to  the 
project  change  request  board  (Fairley  (13)).  This  milestone  and  the  timetable  of 
the  change  request  board  determine  the  conditions  under  which  a meeting  of  the 
board  is  scheduled. 

4.6.  Project  Libraries. 

Horowitz  and  Munson  (19)  suggest  that  the  reuse  of  software  can 
significantly  reduce  the  cost  of  program  development,  and  systems  which  contain 


libraries  of  previously  coded  modules  and/or  a number  of  standard  designs  for 
programs  have  been  proposed  by  Lanergan  and  Grasso  (29)  and  Matsumoto  (31). 
In  ENCOMPASS,  any  software  component  or  group  of  components  can  be  saved 
for  later  reuse.  In  addition  to  source  and  object  code,  documentation,  formal 
specifications,  proofs  of  correctness,  test  data  and  test  results  can  all  be  stored  in 
the  central  library  and  later  retrieved.  The  library  can  support  a number  of  pro- 
jects, both  accepting  and  supplying  components  for  reuse  in  all  phases  of  develop- 
ment. The  structure  and  organization  of  the  library  is  shown  in  Fig.  4.3. 

A programmer,  developing  code,  will  use  a view  of  the  project  library  to 
access  shared  code  and  data,  test  cases,  specifications,  design,  and  other  products 
of  the  project.  The  workspace  extends  the  view  with  local  copies  of  code  that  are 
being  modified  and  with  new  code.  Eventually,  the  programmer  will  submit  his 
workspace  to  be  placed  under  the  configuration  management  of  the  library.  The 
configuration  management  of  the  workspace  must  be  consistent  with  that  of  the 
library  and  acceptance  criteria  may  be  applied  to  the  software  products  before 
the  library  is  updated.  An  integration  test  may  be  required  as  a pre-condition  to 
a library  update  performed  on  a working  version  of  the  software  system.  A 


Fig.  4-S  The  ENCOMPASS  Library  Structure 


system  acceptance  test  may  be  required  as  a pre-condition  to  a library  update 
performed  on  a stable  version  of  the  software  system.  The  project  leader  has 
responsibility  for  correct  library  operation  including  the  view  and  workspace  crea- 
tion and  workspace  integration. 

5.  Software  Tools 

A large  number  of  software  tools  are  required  to  implement  computer-aided 
software  development  environment.  Rather  than  build  a large  number  of  special- 
ized tools,  in  SAGA  we  have  chosen  to  build  a small  number  of  tools  that  can  be 
specialized  for  specific  purposes.  Examples  of  such  “generic”  facilities  are  the 
Notesfiles  system  (Essick  (12),  the  SAGA  language-oriented  editor  (Campbell  and 
Kirslis  (8),  the  symbol  table  manager  (Richards  (35)),  tree  editor  (Hammerslag  et 
al  (18))  and  the  attribute  evaluation  schemes  used  for  semantic  evaluation  (Besh- 
ers  and  Campbell  (2)).  The  Notesfile  system  is  used  for  documentation  and 
management.  The  editor  can  be  specialized  to  edit  many  different  languages  and 
specialized  editors  have  been  built  for  Pascal,  ADA,  PLEASE,  and  C.  In  addition 
to  their  number,  the  software  tools  in  a software  environment  must  also  have 
other  properties. 

Software  development  environments  need  to  be  maintainable  for  the  dura- 
tion that  they  are  used  to  support  a software  development  project  (Campbell  and 
Lauer  (6)).  The  software  tools  in  the  environment  must  accommodate  change  and 
modification  of  the  environment  over  the  lifetime  of  the  software  project.  In  many 
applications,  the  software  support  environment  and  its  tools  must  be  maintained 
for  the  duration  of  the  maintenance  of  the  software  product;  in  the  case  of  an 
embedded  system  like  the  space  station  software  system  this  might  be  for  twenty 
or  more  years.  Changes  in  hardware  technology  may  require  the  environment  to 
be  ported  to  new  computer  systems.  New  tools  may  be  integrated  into  the 
environment.  A solution  to  the  problem  of  maintaining  the  environment  and 
tools  for  a long  period  of  time  is  to  design  them  as  part  of  an  “open  architec- 
ture”. 

In  such  an  open  architecture,  modular  tools  are  built  which  use  standard 
interface  to  access  other  tools.  The  approach  we  have  adopted  in  SAGA  is  to  use 
the  UNIX  operating  system  to  define  a standard  interface.  UNIX  processes 
become  the  mechanism  to  modularize  the  tools.  New  software  tools  built  for 
UNIX  can  be  integrated  into  the  environment  and  UNIX  provides  a method  of 
migrating  the  environment  from  one  computer  technology  to  the  next.  UNIX 
UNITED  (Brownb ridge  et  al  (5)),  LINK  (Russo  (36))  and  other  distributed  UNIX 
systems  permit  the  support  of  software  development  environments  on  networks  of 
workstations. 

Software  engineering  studies  reported  by  Bauer  et  al  (l)  suggest  providing 
the  user  with  a high-level  interface  which  reflects  the  levels  of  abstractions  in  pro- 
gramming. By  allowing  the  user  to  phrase  commands  in  terms  of  high-level  con- 
cepts, the  quality  of  the  user’s  interaction  with  the  computer  can  be  improved. 
Less  time  is  needed  to  accomplish  a given  task,  and  fewer  operations  mean  fewer 
errors  made  during  the  software  development  process.  Since  users  spend  a large 
amount  of  their  time  using  editors,  Scofield  (38)  proposed  using  an  editor  as  an 


appropriate  program  in  which  to  implement  a high-level  interface. 

In  the  remainder  of  this  section  we  discuss  some  of  the  SAGA  tools  that  have 
been  developed  based  on  these  ideas. 

5.1.  Language-Oriented  Editor 

Language-oriented  editors  supply  a high-level  interface  for  software  develop- 
ment tools  (Campbell  and  Richards  (7),  Campbell  and  Kirslis  (8)).  Since  the  edi- 
tor is  the  primary  tool  for  constructing  software  products,  enhancing  the  editor 
with  features  that  aid  the  editing  of  specific  specification  languages  and  program- 
ming languages  should  be  beneficial  to  the  development  process.  The  editors  can 
have  semantic  and  syntactic  oriented  editing  commands  and  may  help  the  pro- 
gram development  process  by  preventing  or  providing  immediate  diagnosis  of  syn- 
tactic and  semantic  errors  in  the  program  text. 

Two  different  approaches  may  be  used  to  construct  a language— oriented  edi- 
tor: the  generator,  or  “template”,  approach  and  the  recognizer  approach.  The 
SAGA  project  has  developed  a recognizer— based  editor.  The  editor  incorporates 
an  LALR(l)  parser  augmented  for  the  interactive  environment  with  incremental 
parsing  techniques  (Kirslis  (25),  Ghezzi  and  Mandrioli  (15)).  An  editor  generator 
(25)  allows  editors  to  be  generated  for  a particular  language. 

The  SAGA  project  has  demonstrated  (25)  that  the  recognizer  approach  is  a 
practical  basis  for  constructing  language-oriented  editors  and  has  several  advan- 
tages: 

1.  The  recognition  approach  can  be  applied  consistently  to  the  editing  of  the 
lexical,  syntactic,  and  semantic  components  of  the  language.  This  simplifies 
providing  uniform  editing  commands  that  manipulate  lexical,  syntactic,  and 
semantic  entities.  Template  editors  are  tedious  to  use  if  they  do  not  use  a 
recognizer  to  enter  expressions,  variable  names,  and  constants.  An  editing 
command  will  differ  in  operation  depending  upon  whether  an  entity  is  recog- 
nized or  generated. 

2.  The  recognition  approach  permits  arbitrary  editing  operations  on  the  pro- 
gram. Rectangular  blocks  of  characters  may  be  copied  from  one  part  of  a 
screen  of  program  text  to  another  as  when  initial  assignments  are  being  made 
to  array  elements.  Global  string  substitutions  may  be  made.  Program  code 
may  be  commented  out  and  comments  may  be  changed  into  program  code. 
The  generator  approach  cannot  handle  arbitrary  editing  commands  unless 
the  resulting  edit  generates  text  which  is  reparsed  into  a form  suitable  for 
the  editor.  Problems  occur  when  such  an  edit  creates  a lexical  or  syntactic 
error. 

3.  Program  editing  during  the  debugging  and  maintenance  phases  of  a project 
will  invariably  require  transforming  the  program  through  a number  of  illegal 
lexical,  syntactic,  and  semantic  constructs.  Many  editors  using  the  generator 
approach  expressly  forbid  the  creation  of  incorrect  programs.  However,  the 
recognition  approach  permits  illegal  programs  which  may  have  many 
incorrect  semantic,  syntactic  and  lexical  errors.  The  errors  may  be  intro- 
duced in  any  order  and  may  be  removed  in  any  order.  When  a lexical  or 


syntactic  error  is  introduced,  the  editor  can  mark  the  discontinuity  in  the 
corresponding  token  or  parse  tree.  When  an  error  is  removed,  the  incremen- 
tal parsing  technique  will  examine  the  surrounding  context  of  the  change 
only  as  far  as  it  is  necessary  to  determine  that  the  change  results  in  a lexi- 
cally and  syntactically  correct  program  fragment.  The  parse  tree  will  be 
repaired  in  the  local  context  of  the  change. 

4.  The  recognition  approach  allows  a lexical  or  syntactic  entity  such  as  a Pascal 
while  loop  to  be  incrementally  changed  into  a repeat  loop  whereas  the  gen- 
erator approach  must  include  a transformation  rule  to  support  such  a 
modification.  Although  it  is  simple  to  generate  a set  of  useful  transformation 
rules,  it  is  not  clear  whether  it  is  possible  to  generate  all  useful  transforma- 
tions of  this  form. 

5.  The  recognition  approach  uses  existing  compiler  generation  and  parsing  tech- 
niques without  major  alteration.  If  standard  compiler  generation  and  pars- 
ing tools  are  used,  then  many  existing  specifications  of  the  lexical,  syntactic, 
and  semantic  components  of  a programming  language  can  be  used  directly 
by  an  editor  generator  facility  to  produce  corresponding  language-oriented 
editors. 

6.  Semantic  analysis  is  performed  in  most  language-oriented  editors  using 
recognition  techniques  that  extend  those  developed  for  compilers.  For  exam- 
ple, the  attribute  evaluation  schemes  proposed  by  Knuth  (27)  have  been  used 
directly  or  encoded  in  a procedural  manner  to  provide  semantic  evaluation  of 
edited  programming  languages  (Reps  et  al  (34)). 

The  SAGA  editor  has  been  used  with  various  semantic  evaluation  methods. 
Beshers  and  Campbell  (2)  describe  an  approach  combining  the  editor  with  right 
regular  expression  grammars,  attributed  grammars,  and  maintained  and  construc- 
tor attributes.  This  method  was  proposed  to  overcome  some  of  the  overhead  that 
occurs  in  direct  attribute  evaluation  schemes.  A SAGA  editor  for  a subset  of  Pas- 
cal has  been  built  that  incrementally  compiles  Pascal  programs  using  more  con- 
ventional techniques  (Kimball  (24)). 

One  of  the  major  problems  in  building  language-oriented  editors  is  that  they 
provide  an  unfamiliar  interface  to  the  user.  To  overcome  this  problem,  a new 
version  of  the  SAGA  editor  is  being  constructed  using  an  EMACS  editor  front 
end. 

5.2.  Notesfiles 

An  important  software  development  tool  for  any  project  is  a means  to 
record,  document  and  retrieve  information.  Such  a tool  can  be  used  to  support 
technical  discussions,  product  reviews,  problem  tracking,  agendas  and  minutes, 
grievances,  design  and  specification  documentation,  lists  of  work  to  be  done, 
appointments,  news  and  mail.  The  SAGA  Notesfiles  system  (Essick,  (12))  has 
been  in  use  for  some  time  to  support  all  these  functions  within  the  SAGA  project. 

The  Notesfiles  system  is  a distributed  project  information  base  constructed 
for  SAGA  on  the  UNIX  operating  system.  A file  of  notes  can  be  maintained 
across  a network  of  heterogeneous  machines.  Each  file  of  notes  has  a topic;  each 


notesfile  has  a title.  A sequence  of  notes  is  associated  with  each  notesfile.  Notes 
and  responses  may  be  exchanged  between  separate  notesfiles.  Notes  and  responses 
are  documented  with  their  authors  and  times  of  creation.  Updates  to  the  notes 
and  responses  are  transmitted  among  networked  systems  to  maintain  consistency. 
Notesfiles  use  the  standard  electronic  mail  facility  to  facilitate  the  updates.  A 
library  and  standard  interface  permits  any  user  program  to  submit  a note  or 
response  to  a notesfile.  This  library  has  been  particularly  useful  in  the  construc- 
tion of  automatic  logging  and  error  reporting  facilities  in  test  harnesses  and  "beta 
test"  uses  of  SAGA  code. 

6.  Conclusion 

One  approach  to  improving  the  productivity  of  large  scale  software  develop- 
ment is  to  construct  software  systems  that  support  the  software  development  pro- 
cess. The  design  of  such  systems  requires  an  understanding  of  the  principles 
underlying  the  software  development  and  maintenance  process  as  well  as  methods 
and  technologies  for  building  complex  design  aids.  We  argue  that  the  experimen- 
tal research  required  to  build  such  environments  should  be  based  on  formal 
models  of  the  software  development  process.  Much  research  is  required  to  pro- 
duce both  the  appropriate  formal  models  and  the  methods  and  techniques  of 
implementation  and  environment. 

In  the  SAGA  Project,  we  have  been  studying  the  construction  of  an  environ- 
ment to  support  the  software  development  and  maintenance.  In  this  paper,  we 
have  outlined  some  of  the  models  being  developed  in  association  with  the  con- 
struction of  an  experimental  environment  called  ENCOMPASS. 
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The  research  described  in  this  dissertation  supports  the  thesis  that  a language-oriented  edi- 
tor for  full  programming  languages,  and  other  languages  specifiable  with  context-free  LR(l) 
grammars,  can  be  based  upon  an  incremental  LR(l)  parser  employing  incremental  analysis  tech- 
niques. The  resulting  editor  is  flexible,  supporting  a higher-level  command  interface  which 
includes  structure-oriented  commands  involving  tokens  and  sub-trees,  while  retaining  common 
text  editing  commands  which  operate  on  arbitrary  groups  of  characters  and  lines.  This  editor 
can  be  used  to  develop  practical  programs  which  incorporate  software  engineering  principles  con- 
cerning the  design  and  construction  of  software  systems.  In  this  dissertation,  an  incremental 
parsing  algorithm  suitable  for  use  with  an  interactive  editor  is  developed.  A new  solution  to  the 
handling  of  comments  in  syntax  trees  is  proposed,  and  an  error-recovery  algorithm  which  per- 
mits editing  of  the  parse  tree  in  the  midst  of  syntax  errors  is  presented.  The  resulting  editor,  its 
commands,  and  environment  are  described.  The  editor  can  be  retargeted  to  other  languages,  and 
can  use  any  parser-generating  system  which  can  meet  its  interface.  A prototype  editor  which 
employs  these  algorithms  has  been  implemented  as  a part  of  the  SAGA  project  as  a demonstra- 
tion of  the  practicality  and  flexibility  of  this  approach;  this  editor  has  been  in  experimental  use 
during  the  past  couple  of  years  at  the  University  of  Illinois  at  Urbana-Champaign. 
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PREFACE 


This  dissertation  details  an  alternate  approach  to  the  syntax-directed,  template  driven, 
language-sensitive  editors  which  are  receiving  much  attention  at  present.  Rather  than  displaying 
the  internal  tree  structure  of  the  program  being  edited,  our  editor  displays  the  program  in  text 
form  on  the  terminal  screen,  no  non-terminals  appear.  Instead  of  restricting  the  editing  com- 
mands to  structure-only  commands  at  certain  points  in  the  program,  and  text-only  at  other 
points,  our  approach  permits  the  use  of  both  structure-oriented  commands  on  tokens  and  trees, 
and  common  text-oriented  commands  on  arbitrary  groups  of  characters  and  lines,  permitting 
each  type  of  command  anywhere  in  the  program.  The  syntax  checking  provided  by  the  parser 
provides  feedback  to  the  programmer  about  the  correctness  of  his  program  as  he  edits  it,  without 
requiring  him  to  always  keep  the  program  text  syntactically  correct  or  to  immediately  repair 
syntax  errors  which  arise.  This  combination  of  feedback  and  flexibility  should  appeal  to  experi- 
enced programmers,  and  I believe  that  this  approach  to  editing  is  practical  and  will  be  favorably 
received. 

An  understanding  of  the  SAGA  editor  and  the  ideas  behind  it  can  be  obtained  through  a 
reading  of  Chapters  1,  3,  6,  and  8.  More  in-depth  information  about  the  internal  structure  of 
the  parse  tree  and  the  incremental  LR(1)  parsing  algorithm  in  use  can  be  found  in  Chapters  4 
and  5,  although  a reading  of  these  chapters  is  only  necessary  to  gain  insight  into  how  the  editor 
works.  Chapter  5 presents  the  incremental  parsing  algorithm  in  enough  detail  to  guide  another 
implementation  of  the  incremental  parser,  should  one  wish  to  extend  tl^e  ideas  presented  here  in 
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future  work.  Chapter  7 describes  the  generation  of  new  editors,  and  the  parser-generators  in  use 
by  the  SAGA  project.  Finally,  Chapter  2 contains  a detailed  look  at  some  of  the  previous  work 

in  the  area. 

A prototype  editor  has  been  produced  as  a demonstration  of  the  feasibility  of  the  ideas 
presented  in  this  dissertation,  and  has  been  in  experimental  use  since  1982  at  the  University  of  Il- 
linois. I have  enjoyed  the  time  I have  spent  on  this  research,  and  my  contacts  with  other  stu- 
dents who  have  based  Master’s  Theses  and  class  projects  upon  this  editor.  I wish  the  best  to  the 
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CHAPTER  1 
INTRODUCTION 


Software  complexity  and  cost  are  severe  problems  in  software  development.  To  keep  costs 
down,  programmer  productivity  needs  to  be  improved.  The  more  powerful  computers  available 
today  permit  more  complex  programs  to  be  written  which  were  previously  not  feasible,  and  pro- 
grammers can  now  better  utilize  tools  to  receive  more  analysis  at  an  earlier  point  in  the  software 
development  cycle.  However,  the  existing  software  tools  are  not  always  adequate  to  manage  the 
large  amount  of  software  required  in  many  new  projects;  new  tools  are  needed.  Properly 
designed  tools  can  improve  programmer  productivity,  if  these  new  hardware  resources  are  put  to 
best  use. 

Software  engineering  research  addresses  many  of  the  problems  in  software  development  and 
offers  formal  results  and  insights  to  the  solution  of  current  problems.  Results  from  software  en- 
gineering [Bauer  et.  al.,  77]  suggest  providing  the  user  with  a high  level  interface  which  reflects 
the  levels  of  abstraction  in  programming.  Since  the  user  can  phrase  commands  in  terms  of  high 
level  concepts,  the  quality  of  the  user’s  interaction  with  the  computer  can  be  improved.  Less 
time  will  be  needed  to  accomplish  a given  task,  and  fewer  operations  mean  fewer  errors  made 
during  the  software  development  process. 

Since  users  spend  a large  amount  of  their  computer  time  using  editors,  an  editor  is  an  ap- 
propriate program  in  which  to  implement  a high  level  interface.  The  interface  can  support  the 
concept  of  structured  programming.  Exploiting  the  syntactic  and  semantic  properties  of  a pro- 
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gramming  language  supports  levels  of  abstraction  and  allows  a programmer  to  build  his  program 
using  either  bottom-up  integration  or  top-down  refinement.  Since  more  computing  resources  are 
available,  one  approach  toward  providing  an  interface  is  to  perform  additional  analysis  that  up 
until  now  was  done  at  a later  time  or  in  another  program;  the  user  benefits  by  receiving  more 
directive  or  diagnostic  information  much  sooner  than  before. 

Parsing  theory  is  well-developed,  but  until  recently  has  been  applied  in  a static  environ- 
ment in  which  parsers  are  run  non-inter  actively  and  take  their  input  from  a file.  Parsing  tech- 
niques must  be  modified  in  an  interactive  language-oriented  editor  in  which  input  is  received  in- 
crementally from  a user  and  a large  portion  of  text  which  has  already  been  parsed  may  be 
modified.  Initial  results  concerning  the  organization  and  complexity  of  incremental  parsers  have 
appeared  [Celentano,  78],  [Ghezzi  and  Mandrioli,  79],  [Ghezzi  and  Mandrioli,  80]  that  suggest 
that  such  methods  can  be  applied  in  practical  interactive  environments  such  as  a language- 
oriented  editor,  and  that  reasonable  response  times  can  be  maintained  while  performing  this  in- 
creased  computation. 

1.1.  Syntax-Directed  vs.  Language-Oriented  Editing 

Language-oriented  editors  have  been  proposed  to  provide  a high-level  interface  for 
software  development  tools.  Two  different  approaches  may  be  used  to  construct  language- 
oriented  editors.  The  generator  approach  (often  called  the  template  approach ) constrains  the 
editing  commands  so  that  only  valid  programs  can  be  developed.  The  recognizer  approach  sup- 
ports both  normal  text  editing  commands  and  additional  language-oriented  commands,  employ- 
ing an  incremental  LR(1)  parser  to  detect  lexical,  syntactic  and  semantic  errors  in  program  frag- 
ments. The  recognizer  approach  provides  a more  flexible  editing  environment  for  program 
development  and  maintenance.  This  second  approach  will  be  presented  in  this  dissertation. 
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1.1.1.  The  Generator  Approach 

To  date,  much  work  with  language-specific  editors  has  followed  the  generator  approach, 
producing  syntax-directed  editors  [Hansen,  70],  [Donzeau-Gouge  et.  al,  75],  [Medina-Mora  and 
Feiler,  81],  [Teitelbaum  and  Reps,  81],  [Reiss,  84].  Such  editors  have  a particular  language  struc- 
ture imposed  upon  them,  resulting  in  an  editor  driven  by  commands  which  are  constrained  to  fol- 
low the  specific  language  structure.  The  user  of  such  an  editor  is  presented  with  a program  skele- 
ton. He  selects  language  constructs  from  a menu  and  places  them  into  pre-deter mined  points  in 
the  program  display.  The  method  constrains  the  user’s  interaction  with  the  editor  to  operations 
which  produce  error-free  syntax,  although  semantic  errors  are  still  possible.  Some  typing  is  also 
saved,  since  the  user  never  types  keywords  or  punctuation.  .» 

This  approach  has  proven  popular  with  implementors  for  several  reasons.  First,  the  user- 
interface  is  simple;  a small  set  of  menu-driven  commands  permits  construction  of  the  tree  in  a 
well-defined  (error-free)  manner.  Second,  the  implementation  is  straightforward;  since  the  user 
is  not  permitted  to  make  syntax  errors,  no  error  detection,  recovery,  or  correction  code  is  needed. 
Third,  a set  of  templates  representing  the  constructs  of  a language  can  be  constructed  without 
much  difficulty.  Fourth,  the  resulting  editor  is  a very  nice  teaching  tool  for  novice  programmers, 
since  at  each  editing  step,  the  user  is  channeled  to  a narrow  path  with  few  choices.  Users  can 
build  programs  free  of  syntax  errors  more  easily  than  with  traditional  text  editors,  which  permit 
syntax  errors  that  may  be  obscure  to  the  new  programmer. 

However,  this  approach  is  inflexible,  and  modifications  to  existing  programs  can  be  difficult. 
In  order  to  replace  one  construct  with  another,  the  sub-trees  first  must  be  removed  from  the 
template  and  saved  somewhere,  then  the  template  deleted,  another  selected  to  replace  it,  and 
finally,  the  trees  re-inserted.  Two  examples  of  modifications  which  illustrate  this  difficulty  are: 
the  addition  of  an  else  clause  to  an  if-then  statement,  and  the  alteration  of  a statement  to  a 
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block  of  two  or  more  statements.  It  is  also  not  practical  to  build  the  program  entirely  by  selec- 
tion of  templates  down  to  the  lowest  expression  level,  since  the  many  selections  needed  become 
tedious.  Therefore,  at  the  lowest  levels  of  the  parse  tree,  expressions  are  input  from  the  key- 
board and  parsed,  and  certain  kinds  of  errors  are  possible. 

Unfortunately,  syntax-directed  template  editors  have  not  been  accepted  by  experienced 
programmers  [Waters,  82].  In  fact,  one  indication  of  the  lack  of  utility  of  such  editors  is  that  the 
developers  of  these  editors  do  not  use  them  themselves  in  their  own  program  development.  In 
addition,  since  experienced  programmers  are  not  troubled  by  syntax  errors,  error  repair  is  a sim- 
ple and  straightforward  task;  the  restrictive  editing  environment  provided  by  these  editors  is  of 
no  benefit  to  these  programmers.  Commands  which  operate  on  arbitrary  groups  of  characters  or 
lines  are  not  provided.  Comments  also  cause  great  difficulty  to  template  editors.  They  are  usu- 
ally handled  by  permitting  (or  requiring)  comments  at  certain  places,  and  prohibiting  them  any- 
where  else.  When  permitted,  their  placement  is  often  restricted  to  a certain  format,  and  block 
copying  of  combined  syntactic  structures  and  comments  is  difficult. 

1.1.2.  The  Recognizer  Approach 

The  recognizer  approach  employs  some  type  of  parser  to  analyze  character  strings  entered 
by  the  user.  A recursive  descent  parser  was  used  by  [Wilcox  et.  al,  76]  in  an  educational  system, 
and  a bottom-up  parser  by  [Horton,  81]  in  his  editor;  both  provided  text  interfaces  to  the  user, 
and  supported  editing  operations  which  manipulated  strings  of  characters.  When  editing  pro- 
grams using  the  recognizer  approach,  the  user  typically  inputs  his  text  in  free  format;  this  input 
is  analyzed  using  an  incremental  parser  and  immediate  feedback  is  provided  about  the  correct- 
ness of  the  program.  With  this  approach,  it  becomes  possible  to  specify  editing  operations  on 
syntactic  and  semantic  entities  such  as  tokens,  sub-trees  or  items  with  particular  semantic  attri- 
butes, in  addition  to  operations  on  arbitrary  groups  of  characters  or  lines.  With  a parser  in  the 
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editor  performing  incremental  compilations  on  portions  of  the  parse  tree,  less  use  of  the  compiler 
is  required  for  detecting  syntactic  and  semantic  errors;  potentially  many  compilation  runs  can  be 
saved  since  a successful  compilation  will  result  the  first  time  that  the  user  runs  the  compiler. 
Still  further  improvement  results  if  the  editor  provides  the  compiler  the  parse  tree  directly,  and 
the  compiler  selectively  recompiles  those  program  fragments  which  changed  during  the  editing 
session. 

Since  a recognizer  is  used,  editing  commands  can  be  supported  which  take  the  program 
through  intermediate,  incorrect  states,  which  facilitates  some  editing  operations  such  as  the 
insertion  of  a widely  spaced  begin  ...  end  pair.  It  also  permits  the  editor  to  provide  the  user 
with  program  specific  information  in  the  form  of  valid  continuations  of  a parse,  which  can  be  cal- 
culated by  the  recognizer  given  the  current  parse  state  and  parse  stack  context,  so  error  repair  is 
simplified  in  cases  when  the  error  is  not  immediately  obvious. 

The  productions  of  the  grammar  used  to  specify  the  language  are  user-transparent;  that  is, 
none  of  the  editing  commands,  error  diagnostics,  or  development  aids  are  based  upon  information 
that  is  not  directly  representable  as  elements  of  the  concrete  syntax.  The  user  sees  a text- 
oriented  display  of  his  program  similar  to  the  screen-oriented  text  editors  available  today;  no 
non-terminal  symbols  appear,  and  it  is  not  necessary  to  become  acquainted  with  the  internal 
grammatical  structure  of  the  production  rules  used  to  describe  the  language  in  order  to 
effectively  use  the  editor. 

The  resulting  editor  is  flexible,  incorporating  an  incremental  LR(1)  parser  with  incremental 
analysis  techniques  to  analyze  the  user’s  input  and  provide  immediate  feedback  about  its  correct- 
ness. The  editor  supports  a higher-level  command  interface,  which  includes  structure-oriented 
commands  involving  tokens  and  sub-trees,  and  retains  common  text  editing  commands,  which 
operate  on  arbitrary  groups  of  characters  and  lines.  In  this  dissertation,  an  incremental  parsing 
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algorithm  suitable  for  use  with  an  interactive  editor  is  developed.  A new  solution  to  the  handling 
of  comments  in  syntax  trees  is  proposed,  and  an  error-recovery  algorithm  which  permits  editing 
of  the  parse  tree  in  the  midst  of  syntax  errors  is  presented.  A prototype  editor  which  employs 
these  algorithms  was  implemented  beginning  in  1981  as  a demonstration  of  the  practicality  and 
flexibility  of  this  approach;  this  editor  has  been  in  experimental  use  over  the  past  couple  of  years. 

1.2.  The  SAGA  Project 

The  SAGA  (Software  Automation,  Generation,  and  Administration)  project  is  investigating 
formal  and  practical  aspects  of  computer-aided  support  for  program  development  in  the  software 
life  cycle  [Campbell  and  Kirslis,  84],  [Campbell  and  Richards,  81].  The  goal  of  the  project  is  to 
design  a practical  software  development  environment  that  supports  all  major  phases  of  the  life 
cycle.  The  design  of  the  system  requires  facilities  to  allow  the  construction  of  a language- 
oriented  editor  for  a large  class  of  formal  languages  including  many  programming  languages, 
specification  languages  and  design  languages.  The  language-oriented  editor  presented  in  this 
dissertation  is  the  editor  of  the  SAGA  project,  and  will  at  times  be  referred  to  as  the  SAGA  edi- 
tor. 

The  SAGA  editor  provides  a means  by  which  the  syntactic  and  semantic  properties  of  a 
programming  language  (or  other  formal  language)  can  be  exploited  to  provide  a more  useful  in- 
teractive environment  for  the  user.  Character,  line,  and  screen  editing  commands  are  augmented 
by  commands  based  on  the  syntax  and  semantics  of  the  particular  language  being  edited.  Frag- 
ments of  the  edited  text  may  be  selected  by  their  syntactic  (and  eventually  semantic)  structure 
and  moved,  copied,  deleted,  or  even  transformed  into  other  well-defined  syntactic  constructs. 
The  same  properties  may  be  exploited  to  constrain  a programmer  to  structure  the  development 
of  a program  using  a particular  methodology  if  desired.  The  editor  is  being  applied  in  a software 
development  environment  of  coordinated  tools  [Kirslis  et  al.,  85].  The  environment  provides  ad- 
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ditional  support  for  the  management  of  software  development. 

Structured  editing  can  also  be  applied  to  abstract  specification  languages.  The  editor  can 
be  used  to  enter  and  verify  sentences  in  the  language;  other  software  tools  can  use  the  structures 
generated  by  the  editor  to  verify  both  the  specifications  and  subsequent  programs  written  to  im- 
plement them.  In  each  case,  providing  the  higher  level  interface  lets  the  user  deal  productively 
with  relevant  concepts  instead  of  lower  level  components;  fewer  operations  are  needed,  fewer  er- 
rors will  be  made,  and  less  time  will  be  needed  to  accomplish  the  task. 

1.3.  Chapter  Summary 

The  remainder  of  this  dissertation  discusses  the  design  and  structure  of  an  editor  based 
upon  an  incremental  LR(1)  parser.  Chapter  2 relates  some  recent  previous  work.  In  Chapter  3, 
Shift/reduce  parsing  is  reviewed,  with  an  emphasis  on  attributes  of  a parse  tree  node  that  can  be 
added  to  provide  support  for  incremental  parsing.  Some  possible  parse  tree  structures  are  inves- 
tigated in  Chapter  4,  and  one  chosen  which  will  best  support  the  incremental  parser,  permit  the 
editor  to  operate  directly  from  the  parse  tree,  and  support  related  software  development  tools  to 
be  used  in  the  SAGA  environment.  The  incremental  LR  parser  proposed  by  Ghezzi  and  Man- 
drioli  is  taken  as  a starting  point  in  Chapter  5,  and  the  extensions  necessary  to  support  an  editor 
with  incremental  parsing  are  presented  and  discussed.  The  integration  of  the  incremental  parser 
with  the  editor,  the  basic  text  and  structure  editing  capabilities,  flexibility  of  the  user  interface, 
and  the  design  of  the  SAGA  editor  as  a hierarchy  of  modules  are  presented  in  Chapter  6. 
Chapter  7 describes  the  SAGA  editor  generating  facility,  which  permits  the  use  of  different 
parser-generator  and  compiler-generator  systems  to  automatically  construct  a SAGA  editor  for 
formally  specified  languages.  The  lexical,  syntactic,  and  semantic  analyses  are  performed  by 
separate  modules  within  the  editor,  each  containing  logically  independent  data  structures.  This 
independence  is  required  in  order  to  effectively  implement  separate  incremental  lexical,  syntactic, 
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and  semantic  analysis.  It  permits  reuse  of  the  remaining  editor  modules  whenever  an  editor  is 
constructed  for  a new  language,  since  none  of  these  modules  contain  any  language-specific  infor- 
mation. Finally,  Chapter  8 presents  the  conclusions  from  this  research,  describes  some  applica- 
tions in  which  the  editor  has  already  been  tested  in  the  SAGA  development  environment,  and 
suggests  some  future  work. 
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CHAPTER  2 

PREVIOUS  WORK 


The  idea  of  an  editor  for  formally  specified  structured  data  is  not  a new  one.  In  the  late 
1960s,  attention  was  directed  to  the  editing  of  general  hierarchies  of  text  [Englebart  and  English, 
68]  and  sections  of  annotated,  linked  text  [Carmody  et  al,  68].  In  1971,  Hansen  used  an  extended 
BNF  formalism  to  describe  hierarchic  text,  and  produced  EMILY,  a template  editor  for  struc- 
tured programs,  but  which  was  retargetable  to  other  formally  specified  structures  [Hansen,  71]. 

A user  of  EMILY  generated  programs  by  the  application  of  syntactic  rules.  The  user  select- 
ed syntactic  constructs  from  a menu,  building  a tree  from  the  root  out  to  the  leaves  in  successive 
refinement  steps.  EMILY  was  based  on  two  principles:  selection  not  entry , applied  to  text  con- 
struction and  operation  invocation,  and  predictable  behavior , which  used  a small  set  of  concepts 
that  a user  can  perceive  with  a little  practice. 

The  extended  BNF  formalism  provided  three  features:  indentation  and  carriage— returns 
could  be  specified  for  the  formatting  of  the  display;  conditional  display  operations  could  test  con- 
tents of  sub— nodes  and  the  identifier  of  the  parent  of  the  node  to  provide  flexible  display  opera- 
tions; and  identifier  and  block  structure  could  be  described  in  the  formalism  so  that  the  system 
could  keep  track  of  all  references  to  identifiers.  EMILY  also  supported  the  elision  of  selected  lines 
of  the  display,  which  Hansen  termed  holophrasting ; he  also  defined  a visible  marker,  the  holo - 
phrast , to  represent  the  elided  subtrees  on  the  display. 
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The  languages  implemented  for  EMILY  were  PL/I,  GEDANKEN  [Reynolds,  70],  a hierar- 
chy language  for  thesis  outlines,  and  the  Emily  syntax  language.  Emily  was  written  in  PL/I,  and 
implemented  on  an  IBM  2250  Graphics  Display  Unit  attached  to  an  IBM  360  Model  75.  Hansen 
found  that  program  construction  took  longer  than  with  a text  editor,  but  that  the  user  made 
fewer  mistakes.  EMILY  consumed  too  many  CPU  cycles  and  memory  to  be  practical  at  the 
time,  but  Hansen  postulated  that  with  decreasing  computer  and  increasing  human  costs,  his  ap- 
proach would  eventually  become  more  feasible,  and  history  has  shown  him  to  be  correct. 

In  1975,  Donzeau-Gouge,  Huet,  Kahn,  Lang,  and  Levy  produced  a text  editor  specialized 
for  editing  program  texts,  and  applied  it  to  the  Pascal  programming  language.  It  was  a first  step 
in  building  what  became  the  MENTOR  programming  environment  at  INRIA-LABORLA,  in  Roc- 
quencourt,  France  [Donzeau-Gouge  et  al.,  75,  79,  80].  In  their  system,  programs  were  manipulat- 
ed as  abstract  objects;  no  parse  tree  existed.  The  abstract  objects  were  labeled  trees,  also  called 
operator— operand  trees,  in  which  internal  nodes  are  operators.  The  programs  were  written  in  a 
concrete  syntax,  but  stored  in  an  abstract  syntax  tree.  An  unparser  was  used  to  regenerate  the 
program.  The  user  of  the  editor  used  structural  addresses  to  specify  sub-trees.  A constructor 
performed  syntax  analysis,  to  permit  pre-existing  code  to  be  edited.  A separate  process  later 
performed  semantic  analysis. 

The  user  looked  at  sub-trees  through  a window,  and  an  integer  n,  the  holophrasting  depth, 
was  attached  to  the  sub-tree  to  specify  the  level  of  detail  to  display.  Long  lists  could  be  rolled-, 
the  beginning  and  end  hidden,  with  a portion  of  the  middle  displayed.  Comments  could  be  at- 
tached to  a node,  either  as  a prefix  or  postfix.  Comments  were  not  normally  displayed,  but  could 
be  called  up.  Evaluators  computed  on  the  abstract  tree.  Their  editor  was  written  in  Pascal. 

A table-driven,  interactive,  diagnostic  programming  system,  CAPS,  was  produced  by  Wil- 
cox, Davis,  and  Tindall  in  1976  [Wilcox  et  al.,  76].  CAPS  was  a highly  interactive,  menu-driven 
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editor,  diagnostic  compiler  and  interpreter  which  was  used  to  prepare,  debug,  and  execute  simple 
programs.  Errors  were  diagnosed  both  at  compile  and  run  time.  The  analysis  was  performed 
character  by  character;  when  an  error  occurred,  a box  was  flashed  around  the  invalid  character, 
any  additional  input  data  was  ignored,  and  CAPS  began  an  interaction  with  the  student  to  find 
the  cause  of  the  error.  The  user  could  back  up  the  cursor  to  erase  the  box  and  resume  editing,  or 
press  a HELP  key  for  auto-generated  diagnostic  assistance.  The  first  press  of  the  HELP  key 
displayed  an  error  message;  subsequent  presses  suggested  possible  repairs.  CAPS  employed  a 
recursive— descent  parser  with  complete  syntax  checking.  The  internal  representation  was  a list 
of  tokens,  including  spacing  information  and  comments.  Static  semantic  analysis  was  also  per- 
formed. Execution  interpretation  included  a trace  facility  and  run-time  error  analysis.  CAPS 
was  table-driven  and  could  be  retargeted  to  other  languages.  New  interpreters  had  to  be 
designed  and  implemented  for  a new  language,  but  many  modules  could  be  reused,  since  the 
internal  structure  had  the  same  form,  regardless  of  the  language.  CAPS  was  available  for  For- 
tran, PL/I,  and  COBOL.  It  was  implemented  on  the  PLATO  IV  Computer-Based  Education 
System  at  the  University  of  Illinois  at  Urbana-Champaign. 

In  1977,  Teitelman  produced  INTERLISP,  a display-oriented  programmer’s  assistant 
[Teitelman,  77].  It  was  a programming  system  for  LISP,  based  on  interpretation,  with  emphasis 
on  the  debugging  and  execution  of  programs.  An  interpreter  linked  program  pieces  for  execution. 
A system  debugger  worked  by  interpreting  code.  Code  pieces  could  be  compiled,  but  the  de- 
bugger only  accessed  the  interpreted  code. 

In  the  late  1970s,  Teitelbaum  designed  and  built  the  Cornell  Program  Synthesizer  [Teitel- 
baum,  79],  [Teitelbaum  and  Reps,  81].  The  Synthesizer  provides  a syntax— directed  programming 
environment.  It  incorporates  a grammar  via  templates  which  are  predefined  in  the  editor.  Pro- 
grams are  created  top-down.  New  templates  are  inserted  within  the  skeleton  of  previously  en- 
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tered  templates.  However,  phrases  (assignment  statements,  expressions,  and  lists  of  variables) 
are  entered  directly  as  text.  Programs  are  translated  into  interpretable  form  during  program  en- 
try. Program  development  and  testing  can  be  interleaved;  interpretation  is  suspended  when  an 
unexpanded  placeholder  is  encountered;  it  can  be  resumed  after  the  placeholder  is  expanded.  A 
batch  LR  parser  is  used  at  the  expression  level.  Errors  are  detected  as  soon  as  the  user  moves 
the  editing  cursor  out  of  a field;  the  cursor  is  positioned  at  the  point  of  the  error.  Modifications 
are  performed  by  the  clip,  delete,  and  insert  commands.  For  example,  to  change  a statement 
into  a block  of  two  statements:  the  statement  is  clipped  and  replaced  by  the  original  placeholder, 
a block  template  is  selected,  the  clipped  statement  is  inserted,  and  the  new  statement  added.  Se- 
mantic analysis  is  performed  through  an  incremental  attribute  re-evaluation  scheme  [Reps,  82]. 
The  Synthesizer  has  been  shown  to  be  a good  educational  tool:  it  has  been  used  in  introductory 
programming  courses  at  several  universities  since  June  1979.  It  has  been  used  with  the  language 
PL/CS  [Conway  and  Constable,  76],  a subset  of  PL/I. 

The  Synthesizer  is  limited  in  utility  in  that  editing  operations  (modifications,  moving,  copy- 
ing) are  cumbersome  and  not  likely  to  be  favored  by  experienced  programmers.  Simple  text  edit- 
ing commands  are  limited  to  phrases  only.  It  cannot  be  used  with  pre-existing  software  because 
there  is  no  way  provided  to  convert  fragments  into  template  form.  Comments  can  only  be  in- 
serted at  selected  locations,  and  are  required  in  certain  locations.  The  Synthesizer  employs  a hy- 
brid approach:  recursive  descent  at  high  levels  (templates),  and  parsing  of  character  strings  at 
low  levels  (phrases). 

Unlike  the  Cornell  Program  Synthesizer,  the  BABEL  editor  by  Horton  [Horton,  81] 
presents  a text-like  interface  to  the  user,  and  provides  commands  which  operate  on  sequences  of 
characters.  It  performs  lexical  analysis  on  the  input  text,  producing  a list  of  tokens,  can  perform 
optional  checking  of  the  syntax  based  on  an  earlier  Ghezzi  & Mandrioli  parsing  algorithm  pub- 
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lished  in  1979  [Ghezzi  and  Mandrioli,  79],  and  can  also  perform  optional  semantic  checking  using 
Reps’  algorithm.  This  algorithm  operates  on  grammars  of  the  class  LR(1)  f|  RL(1).  That  is,  the 
grammar  must  be  LR(1)  and  the  reversed  grammar,  obtained  by  reversing  the  right  hand  sides  of 
all  of  the  productions,  must  also  be  LR(1).  The  algorithm  uses  both  left-thread  and  right-thread 
pointers  to  store  parser  state  information;  Horton  replaced  some  of  the  links  with  access  routines 
to  generate  them  in  order  to  save  space.  Comments  are  handled  by  attaching  them  to  the  follow- 
ing token.  Programs  are  not  permitted  to  be  incomplete,  and  it  is  not  possible  to  place  unex- 
panded non— terminals  in  the  tree  (that  is,  there  are  no  placeholders.)  Horton  defined  a Language 
Description  Language  (LDL)  to  specify  the  language  for  which  an  editor  is  to  be  built;  a modified 
yacc  parser  is  used  to  produce  the  parse  tables. 

Horton  reported  that  in  BABEL,  the  running  time  required  to  perform  syntax  checking  is  5 
times  as  much  as  that  taken  by  the  vt  text  editor  [Joy  and  Horton,  80]  to  perform  the  same  edit- 
ing operation  (with  no  analysis).  Semantic  checking  is  15  times  slower  when  an  executable  state- 
ment is  changed;  when  a declaration  is  changed,  it  is  slower  by  a “much  larger  factor” 
(unspecified).  (Horton  reports  on  one  example  with  semantic  analysis  which  took  62  times  as 
much  processing).  BABEL  trees  without  semantic  information  average  30  times  the  size  of  the 
equivalent  text  file;  with  semantic  information,  the  size  increases  to  300  times. 

At  Carnegie-Mellon,  Medina-Mora  and  Feiler  have  produced  an  editor  as  part  of  the  Incre- 
mental Programming  Environment  (IPE)  [Medina-Mora  and  Feiler,  81].  The  environment  con- 
sists of  several  tools:  the  editor,  translator,  linker  and  loader,  and  debugger.  The  user  interacts 
with  the  entire  system  through  the  editor;  other  tools  are  invoked  by  the  editor  as  needed.  The 
editor  is  syntax-directed;  the  programmer  constructs  his  program  by  inserting  templates,  and 
syntactic  correctness  is  enforced.  The  editor  represents  the  program  internally  as  an  abstract 
syntax  tree.  An  unparser  translates  the  tree  back  into  readable  text  to  present  the  programmer 
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with  his  program.  Semantic  correctness  is  not  enforced;  semantic  checking  routines  perform 
further  analysis,  and  are  automatically  invoked. 

Their  system  supports  incremental  program  translation.  The  program  is  debugged  with  a 
language-oriented  debugger.  A syntax-directed  editor  generator  is  used  to  prepare  additional  ed- 
itors. IPE  provides  an  environment  for  a single  programmer  working  on  a single  program.  IPE 
is  a component  of  the  Gandalf  project,  which  coordinates  programmers  and  versions  of  programs 
[Haber mann,  79].  The  language  supported  is  GC,  a type-checked  variation  of  C [Kernighan  and 
Ritchie,  78]  with  modular  structure.  Language  descriptions  also  have  been  prepared  for  a subset 
of  Ada;  Alfa,  an  non-Algol-like  applicative  language  designed  by  Habermann;  the  system 
language  of  Gandalf;  and  the  grammatical  description  itself.  An  IPE  prototype  is  running  under 
UNIX  on  a VAX.  Medina-Mora  and  Feiler  have  found  that  new  users  need  to  get  used  to  the 
structured  editing  approach;  expression  entering  and  editing  is  more  difficult  than  text  editing; 
preexisting  code  cannot  be  used  unless  a parser  is  built  to  perform  a preprocessing  pass  to  con- 
vert it  into  tree  form. 

At  the  University  of  Illinois,  Orailoglu  has  reviewed  the  design  issues  involved  in  the 
development  of  hierarchical  editors,  and  has  produced  an  editor  which  employs  a modified  LL(1) 
predictive  parser  [Orailoglu,  83].  He  incorporates  language-specific  information  through  a user- 
specified  grammar  with  incomplete  productions.  The  user  interface  of  the  editor  permits  move- 
ment by  characters,  words,  lines,  and  from  one  node  to  another  within  the  surrounding  tree 
structure.  Text  characters  entered  by  the  user  are  inserted  at  the  position  of  the  cursor;  a delete 
key  deletes  the  character,  word,  line,  or  tree  at  which  the  cursor  is  positioned.  User-supplied 
pretty  printing  information  can  be  specified  in  the  language  description;  comments,  however,  are 
not  pretty  printed.  Lexical  analysis,  syntax  analysis,  and  pretty  printing  are  performed  charac- 
ter by  character;  detection  of  an  error  causes  the  remaining  input  to  be  displayed  in  reverse  video 
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as  the  user  continues  typing.  No  semantic  analysis  is  performed. 

Shilling  has  extended  Orailoglu’s  editor  with  a combination  of  follow-the-cursor  parsing, 
which  parses  only  the  characters  up  to  the  editing  cursor,  and  soft  templates , which  appear  fol- 
lowing the  cursor  to  indicate  the  non-terminals  the  parser  is  expecting  but  has  not  yet  had  com- 
pleted by  the  parser  [Shilling,  85].  The  templates  are  soft  in  that  the  user  is  not  obligated  to  fol- 
low them.  Shilling  is  also  adding  semantic  analysis  based  upon  attribute  grammars  and  the  at- 
tribute update  algorithm  of  Reps.  The  editor  has  language  description  grammars  for  Cobol,  For- 
tran, Pascal,  C,  and  some  other  languages;  it  is  implemented  on  several  systems  which  run  the 
4.2BSD  UNIX  operating  system. 

There  have  been  other  efforts  toward  improving  the  editing  process  or  the^software 
development  environment  in  similar  ways  [Fraser,  81],  [Morris  and  Schwartz,  81],  [Osterweil,  82, 
83]  and  [Osterweil  and  Cowell,  83].  A number  of  reviews  survey  the  field  of  editors,  summarizing 
many  efforts  [Meyrowitz  and  van  Dam,  82],  [Reid  and  Hanson,  81]  and  [van  Dam  and  Rice,  71], 
and  will  be  of  interest  to  the  reader  desiring  more  detailed  background  information. 

The  development  of  structure-oriented  editors  has  been  monitored  by  several  individuals, 
who  have  made  the  following  observations.  Waters,  at  MIT,  notes  that  early  implementations  of 
syntax-directed  editors  have  been  overly  restrictive,  and  that  the  criticisms  about  them  are  gen- 
erally valid.  He  believes  that  the  editors  need  time  to  mature,  and  could  become  quite  attractive 
to  use  at  some  point  in  the  future.  He  is  firm  that  text  oriented  commands  should  not  be  re- 
placed, but  augmented  with  structure-oriented  commands  [Waters,  82], 

Meyrowitz  and  van  Dam,  at  Brown  University,  note  that  a well-defined,  consistent,  concep- 
tual model  is  needed,  instead  of  the  ad  hoc  methods  used  today.  Documentation  is  needed  which 
explains  the  conceptual  model  and  the  user  interface.  A clear,  concise,  orthogonal  user  interface 
that  is  easy  to  learn  is  needed.  Today,  interfaces  are  haphazard  and  contradictory.  The  sharing 
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of  project  information  and  files  among  a group  in  a controlled  way  is  also  needed  [Meyrowitz  and 
van  Dam,  82]. 

The  SAGA  editor  addresses  many  of  these  points.  Because  it  is  based  upon  an  incremental 
parser,  its  user  interface  is  much  more  flexible  than  that  which  has  been  provided  by  structure 
editors.  Text-oriented  commands  have  not  been  replaced,  but  retained  and  augmented  with 
structure-oriented  commands.  The  user  interface  is  concise  and  orthogonal,  permitting  the 
specification  of  groups  of  characters,  tokens,  lines,  and  sub-trees,  and  applying  all  built-in  opera- 
tions to  all  argument  types  which  make  sense.  Comments  are  handled  as  any  other  token,  and 
not  treated  as  a special  case  as  in  all  other  systems  to  date.  We  believe  that  we  also  have  a solu- 
tion to  the  sharing  of  project  information  and  files  among  a group  through  an  Integrated  Modular 
Environment  (Kirslis  et  al.,  85];  we  have  defined  a model,  representation,  and  implementation  for 
an  environment  which  can  be  used  with  many  standard  software  development  tools  available  to- 
day, and  with  which  additional  benefits  are  possible  when  combined  with  a language-oriented  ed- 
itor such  as  the  SAGA  editor.  The  approach  we  have  taken  has  already  yielded  a prototype 
language-oriented  editor  and  environment.  We  believe  that  the  editor  and  support  environment 
has  practical  application  in  software  development,  and  we  believe  that  it  will  be  possible  to  refine 
these  prototype  tools  into  a usable  system  with  significant  benefit  to  software  engineers. 
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CHAPTER  3 

SHIFT-REDUCE  PARSING 


We  begin  our  discussion  of  parsing  with  a quick  review  of  shift-reduce  parsing,  also  called 
LR(k)  parsing;  we  then  describe  a left  threading  of  a parse  tree  which  will  be  of  great  use  when  we 
turn  our  attention  to  incremental  parsing.1  In  LR(k)  parsing,  the  k refers  to  the  number  of  sym- 
bols at  the  head  of  the  input  string  that  are  passed  to  the  parser  to  enable  it  to  determine  the 
parsing  action;  these  symbols  are  termed  the  lookahead  of  the  parser.  The  languages  in  which  we 
are  interested  can  be  defined  by  LR  grammars  with  k = i,  so  we  will  restrict  our  discussion  to 
this  class.2  A subset  of  LR(1)  grammars,  termed  LALR(l)  grammars,  is  of  particular  interest  to 
us,  since  the  parse  tables  produced  for  this  class  are  much  smaller  and  better  suited  for  practical 
use. 


3.1.  Preliminary  Definitions 

LR  parsers  are  driven  from  tables  which  can  be  algorithmically  generated  from  a formal 
specification  of  the  language.  Programs  which  apply  these  algorithms  and  produce  these  tables 
are  called  parser-generator*.  Since  the  specific  techniques  and  algorithms  used  by  parser- 
generators  are  well  documented  elsewhere  and  are  not  necessary  for  the  understanding  of 


headers  unfamiliar  with  LR(k ) parsing  can  find  an  introduction  to  the  subject  in  [Aho  and  Ullman, 
77],  especially  Chapter  5.  In  this  discussion,  we  assume  the  reader  is  familiar  with  ehift/rcduce  parsing  no- 
tation as  defined  in  [Aho  and  Ullman,  72]. 

2If  there  is  an  interest  in  a language  described  by  an  LR  grammar  with  k > i,  and  if  a parser- 
generator  is  available  which  will  process  the  grammar,  it  would  be  a simple  matter  to  modify  the  editor  to 
pass  to  the  parser  a list  of  lookaheads.  The  incremental  parsing  algorithm  is  not  affected. 
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language-oriented  editing  being  presented  in  this  dissertation,  a discussion  of  these  techniques  is 
omitted.  The  reader  may  consult  [Aho  and  Ullman,  77]  for  more  information  about  this  topic. 

The  formal  specification  of  a language  can  take  a number  of  forms.  Two  of  the  parser- 
generators  used  by  the  SAGA  project  [Noonan  and  Collins,  84],  [Mickunas,  86]  use  a context-free 
grammar  in  Backus-Naur  Form  (BNF)  as  a specification.  A third  parser-generator,  presently 
under  construction,  will  take  an  extended  BNF  specification  [Beshers,  84],  For  simplicity,  we  will 
only  discuss  BNF  syntax,  since  the  extended  BNF  can  always  be  rewritten  in  this  form. 

BNF  notation  provides  a means  to  write  a formal  description  of  a language  for  which  we 
wish  to  construct  a parser.  The  description  is  given  as  a context-free  grammar  G,  which  is 
defined  to  be  the  four-tuple  (N,  E,  P,  S),  where  N is  the  finite  non-empty  set  of  non-terminal 
symbols,  E is  the  finite  set  of  terminal  symbols,  P is  the  finite  set  of  productions  A — ► a,  where 
A € N and  a £ (N  U E)*,  and  S £ N is  a distinguished  non-terminal  termed  the  start  symbol 
[Hopcroft  and  Ullman,  79].  The  sets  N and  E are  disjoint;  that  is,  N D E = 0 (the  empty  set). 
Additionally,  N U E is  conventionally  denoted  as  V.  See  Figure  3-1  for  an  example  of  a gram- 
mar. 


<S>  <E> 

(1) 

<E>  ::=  <E>  + <E> 

(2) 

<E>  ::=  <E>  * <E> 

(3) 

<E>  ::=  ident 

(4) 

<E>  integer 

(5) 

<E>  ::=  ( <E>  ) 

(6) 

Figure  3-1:  An  (Ambiguous)  Grammar  for  Simple  Expressions.  G = (N,  E,  P,  <S>), 
where  N = (<S>,  <E>),  E = (*' dent,  integer,  +,  *,  ’(’,  ’)’),  P is  shown  above,  and 
<S>  is  the  start  symbol. 
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A string  over  V is  a finite  sequence  of  symbols  from  V.  V*  denotes  the  set  of  all  strings 
from  V,  including  the  empty  string  e,  with  V+  = V*  - { e }.  If  a,  7,  /3  £ V*,  and  A £ N,  then 
a A/3  directly  derives  cr^/3,  denoted  cxA/3  =»  07  /3,  where  =^>  is  a relation  between  strings  in  V*, 
and  A — ► 7 is  a production  in  the  grammar.  (The  production  A —*■  7 is  applied  to  the  string  aA/3 
to  yield  07 j3.) 

If  cVj,  cvf,  ...,  are  strings  in  V*,  and 

° 2 =*  ^3-  an-l  =>  °V 

then  A j derives  an,  denoted  =>  aQ.  By  convention,  a =>  a. 

A derivation  tree  D for  a context-free  grammar  G = (N,  E,  P,  S)  is  a labeled  ordered  tree 

in  which  each  node  is  labeled  by  a symbol  from  V;  if  A labels  an  interior  node  and  5,,  ....  B la- 

I n 

bel  the  immediate  descendants,  then  A — ► B ^ B ^ ...  B is  a production  in  P. 

The  frontier  of  a derivation  tree  is  the  string  w = w1  w0  ...  w , where  the  tv  - £ E are  the 

1 w n>  t 

labels  of  the  terminal  nodes  read  left  to  right. 

Given  a context  free  grammar  G with  start  symbol  S,  the  language  generated  by  G,  denot- 
ed L(G),  consists  of  all  strings  of  terminals  w such  that  S =>  w.  A sentential  form  a of  G is  a 
string  of  terminals  and/or  nonterminals  such  that  S ==>  a. 

A rightmost  derivation  is  a derivation  in  which  the  rightmost  nonterminal  in  a sentential 

form  is  replaced  at  each  step  in  the  derivation.  Such  a derivation  is  denoted  by  a =g>  /3  if  a single 

step  is  taken,  ot  >3  if  zero  or  more  steps  are  taken,  and  cv  =jj^  /3  if  at  least  one  and  possibly 
more  steps  are  taken.  A right  sentential  form  a is  a sentential  form  generated  from  the  start 
symbol  S by  a rightmost  derivation. 

It  can  be  determined  whether  a string  of  terminal  symbols  from  G is  a sentence  in  L(G)  by 
performing  the  reverse  of  a rightmost  derivation  on  the  terminal  string.  A program  to  perform 
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such  an  analysis  in  limited  cases  of  context  free  languages  is  termed  an  LR  parser,  which  consists 
of  a driver  routine  to  perform  the  parsing,  a set  of  named  or  numbered  parse  tables  to  direct  the 
parse,  a parse  stack  on  which  to  store  the  intermediate  results,  and  an  input  from  which  to  read 
the  string  to  be  parsed.3  The  parser  is  assigned  a parse  state  which  identifies  one  of  the  parse 
tables  as  the  table  to  reference  at  the  current  step  in  the  parse.  One  of  these  states  is  dis- 
tinguished as  the  initial  state  in  which  the  parser  is  to  begin  execution. 

A configuration  of  an  LR  parser  for  a grammar  G is  an  ordered  pair  (K,  I),  where  K 
represents  the  contents  of  the  parse  stack  and  I is  the  input  not  yet  rr  d by  the  parser.  Each  en- 
try on  the  parse  stack  K consists  of  a symbol  Y followed  by  a parse  state  P,  where 

Y G V U bof  U eof. 

The  special  symbols  bof  and  eof  serve  as  left  and  right  pads,  respectively,  to  delimit  the  contents 
of  the  stack  and  the  input.4 


3.2.  Shift— Reduce  Parsing 

Given  a grammar  that  describes  a language,  and  a parser-generator  to  produce  parse  tables 
from  that  grammar,  we  can  build  a parser  which  uses  those  tables  to  analyze  strings  in  the 
language.  We  illustrate  the  operation  of  this  parser  with  an  example  of  the  parsing  of  a string  in 
the  language  generated  by  the  grammar  given  in  Figure  3-1.  In  this  grammar,  the  ident  symbol 
represents  an  alphanumeric  identifier,  and  the  integer  symbol  represents  an  integer. 


3The  LR  in  LR  parser  stands  for  Left-to-right  scan  of  the  input  and  construction  of  a Rightmost 
derivation  in  reverse. 

4The  use  of  a left  and  right  pad  theoretically  restricts  us  to  strict  deterministic  context-free  languages 
with  end  markers,  and  also  less  than  full  LR  parsers,  since  the  parser  should  only  check  that  the  input  is 
exhausted  once  it  has  decided  to  accept  the  input  already  scanned.  But  this  restriction  is  of  no  practical 
significance.  During  initialisation  of  the  editor,  the  parser  is  called  with  the  empty  string  to  produce  an  in- 
itial tree  which  provides  a uniform  context  for  subsequent  editing.  An  eof  symbol  is  always  present  at  the 
right  end  of  the  parse  tree  frontier,  and  so  is  always  passed  to  the  parser  whenever  the  end  of  the  parse 
tree  is  reached.  While  the  parser  always  will  be  passed  both  a parse  state  and  this  eof  symbol  when  the  in- 
put has  been  exhausted,  it  is  not  required  to  look  at  this  symbol,  and  can  behave  in  the  above  manner.  In 
addition,  grammars  need  not  include  the  end  markers;  the  choice  is  left  to  the  parser-generating  system. 
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Assume  that  the  grammar  given  in  Figure  3-1  has  been  processed  by  a parser-generating 
system  to  produce  the  set  of  parse  tables  given  in  Table  3-1.  We  are  not  concerned  here  with 
how  the  tables  are  produced  from  the  grammar;  such  techniques  are  well-understood,  and  the 
reader  can  find  these  details  in  [Aho  and  Ullman,  77]  if  interested.  In  Table  3-1,  each  row  is  a 
single  parse  table.  The  parser  begins  in  state  1,  and  enters  other  states  as  directed  by  entries  in 
the  table  for  its  current  state.  There  is  a single  column  for  each  terminal  and  non-terminal  sym- 
bol in  the  grammar.  The  parser  reads  a symbol  from  the  input;  this  symbol  becomes  the  looka- 
head symbol  which  is  used  to  select  an  entry  from  the  parse  table  to  direct  the  parser’s  next  step. 

Given  this  set  of  parse  tables  and  an  input  string  to  be  parsed,  the  parser  begins  in  an  ini- 
tial configuration  (K,  I)  which  is  given  by  the  2-tuple  with  only  the  bof  (beginning  of  file)  token 
and  initial  parse  state  stored  in  the  stack  component  K,  and  the  input  string  in  the  second  com- 
ponent I.  At  each  step  in  the  parse,  the  parse  state  on  the  top  of  the  stack  is  the  current  state  of 
the  parser,  and  the  first  token  in  the  input  string  is  the  lookahead  symbol. 


State 

■ 

S 

E 

1 

slO 

s4 

r2 

82 

s3 

2 

S 

acc 

3 

E 

s6 

s5 

r3 

4 

( 

slO 

s4 

s7 

5 

* 

slO 

s4 

s9 

6 

+ 

slO 

s4 

s8 

7 

E 

s6 

s5 

all 

8 

E 

r4 

s5 

r4 

9 

E 

r5 

rS 

r5 

r5 

10 

id 

r8 

r6 

r6 

rO 

11 

_L 

r8 

r8 

r8 

r8 

Table  3-1:  LR(O)  parsing  tables  for  the  simple  expression  grammar  given  in  Figure  3-1. 
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Four  types  of  parsing  actions  are  shown  in  the  parse  tables:  shift  (s),  reduce  (r),  accept 
(acc),  and  error  (blank).  When  a shift  action  is  indicated,  the  parser  removes  the  input  token 
from  the  second  component  of  the  tuple  and  pushes  it  onto  the  stack  (the  first  component)  to- 
gether with  the  state  given  in  the  parse  table  with  the  shift  action.  This  state  becomes  the  new 
state  of  the  parser. 

When  a reduce  action  is  indicated,  the  parser  uses  the  associated  number  to  determine 
which  production  rule  of  the  grammar  is  to  be  used  to  perform  the  reduction.  The  parser  pops 
the  entries  corresponding  to  the  right  hand  side  of  the  rule  off  the  stack,  and  then  prepends  the 
token  code  corresponding  to  the  left  hand  side  of  the  production  rule  to  the  head  of  the  input  in 
the  second  component. 

Standard  parsers  at  this  point  typically  replace  the  tokens  and  states  corresponding  to  the 
right  hand  side  of  the  production  rule  with  the  non-terminal  on  the  left  hand  side  of  the  produc- 
tion rule  and  the  new  parse  state.  The  new  state  is  determined  by  applying  a goto  function  to  the 
state  uncovered  on  the  parse  stack  after  the  right  hand  side  of  the  production  rule  has  been  re- 
moved, and  the  non-terminal  on  the  left  hand  side  of  the  production.  The  goto  function  will  al- 
ways return  a shift  action,  and  there  can  never  be  an  error  when  it  is  computed.  But  by  prepend- 
ing the  non-terminal  to  the  head  of  the  input  stream  instead  of  continuing  with  the  reduction  in 
this  manner,  the  next  action  that  the  parser  will  perform  will  be  exactly  this  goto  function.  By 
having  the  parser  treat  the  goto  function  as  the  standard  parsing  action,  instead  of  separating  it 
out  as  a special  case,  we  gain  the  ability  to  have  non— terminal  nodes  treated  identically  to  termi- 
nal nodes.  This  uniformity  is  important,  since  it  permits  the  SAGA  editor  to  pass  previously 
parsed  sub-trees  to  the  parser  intact  to  be  inserted  into  the  parse  tree  at  a new  location.  The 
contents  of  this  (potentially  large)  sub-tree  need  not  be  reparsed,  saving  computation  time,  and 
we  can  provide  an  editing  command  to  move  sub-trees  around  easily  in  the  editor. 
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4a  (bofl  <E>  3*5, 

4b  (bof  1 <E>  3 * 5 x 10, 
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<S>  eof) 

s2 

eof) 
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Figure  3-2:  Configurations  through  which  the  parser  passes  during  an  LR  parse  of  the  input 
string:  a * x + b.  The  current  parse  state  is  stored  on  the  top  of  the  stack,  and  the  lookahead 
symbol  is  at  the  head  of  the  input.  The  simple  expression  grammar  is  given  in  Figure  3-1,  and 
the  set  of  parse  tables  produced  from  this  grammar  in  Table  3-1. 


An  accept  action  tells  the  parser  to  terminate  and  accept  the  input  string  as  a legal  sen- 
tence in  the  grammar.  When  the  parser  terminates  in  this  way,  the  start  symbol  of  the  grammar 
will  be  the  only  token  on  the  parse  stack  (not  counting  the  bof  token  which  is  always  present), 
and  the  eof  token  will  be  the  only  token  remaining  in  the  input  string. 

A blank  entry  in  the  table  indicates  that  an  error  has  occurred;  in  this  case  the  parser  ter- 
minates and  rejects  the  string  as  a non-sentence  in  the  language.5 

sIn  the  case  of  the  SAGA  editor,  the  parser  invokes  an  error  handler  to  save  the  information  necessary 
to  enable  the  parser  to  resume  the  parse  at  a later  time,  and  to  enable  the  editor  to  display  this  portion  of 
the  tree  in  the  meantime. 
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input  = a * x + b eof 


(a)  Initial  Configuration  of  the  Parser 


top-of-stack 


input  = *x+b  eof 


(b)  Parser  State  Before  Move  lb. 


(c)  Parser  State  Before  Move  3, 


top-of-stack 


input  = x + b eof 


left  thread  pointers 


«< leftson  and  sibling  pointers 

parent  pointers 


(d)  Parser  State  Before  Move  4a. 

Figure  3-3:  Construction  of  the  parse  tree  for  the  first  few  moves  shown  in  Figure  3-2.  The 
numbers  in  the  nodes  refer  to  the  state  of  the  parser  just  after  a shift  action  was  performed  with 
that  node. 


Let  us  now  consider  the  input  string  a * x + b.  Figure  3-2  shows  the  moves  that  our  parser 
makes  at  each  step  in  the  parse.  The  first  row  in  the  table  shows  the  initial  configuration  of  the 
parser.  At  each  step  in  the  parse,  the  rightmost  number  in  the  first  component  is  the  parser's 
current  state,  and  the  leftmost  symbol  in  the  second  component  is  the  lookahead  symbol  at  the 
head  of  the  unread  input  string.  At  the  conclusion  of  this  successful  parse,  the  stack  contains 
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only  the  bof  symbol  and  the  start  symbol,  and  the  input  string  only  the  eof  symbol. 

3.3.  Constructing  the  Parse  Tree 

The  approach  used  by  the  SAGA  editor’s  parser  is  similar  to  the  one  described  above,  ex- 
cept that  a parse  tree  actually  is  constructed  during  the  parsing  operation.  In  addition,  no  expli- 
cit parse  stack  is  used  by  the  editor’s  parser.  Instead,  the  parse  stack  is  directly  incorporated 
into  the  parse  tree  as  it  is  constructed.  Each  parse  tree  node  is  augmented  with  a left  thread  at- 
tribute, which  contains  a pointer  to  the  node  that  would  be  directly  beneath  this  one  on  a parse 
stack  if  one  were  used.  A top-of-stack  variable  points  to  the  node  which  would  be  on  the  top  of 
this  stack  at  each  point  in  the  parse.  Figure  3—3  illustrates  the  parse  tree  and  input  string  mani- 
pulated by  the  parser  for  the  first  few  moves  of  the  parse  given  in  Figure  3-2.  Figure  3-4 
presents  the  completed  parse  tree  constructed  by  the  parser  for  this  input  string. 

In  Figure  3-4,  the  reader  should  take  note  of  the  left  thread  pointers,  shown  as  solid  ar- 
rows, which  connect  the  nodes  of  the  tree.  The  number  in  each  node  is  the  new  parse  state  of  the 


Figure  3-4:  The  parse  tree  constructed  for  a * x + 6. 
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parser  just  after  that  node  is  shifted.  By  storing  these  two  pieces  of  information  in  the  parse 
tree,  each  and  every  configuration  through  which  the  parser  has  passed  during  the  entire  parse  is 
captured.  By  setting  the  top-of-stack  variable  to  any  of  these  nodes,  and  the  parser  state  to  the 
state  found  in  that  node,  we  recreate  the  exact  configuration  the  parser  was  in  at  this  point  in  the 
parse,  just  as  though  we  had  begun  the  parse  from  scratch  and  proceeded  up  to  this  point,  paus- 
ing just  after  this  node  had  been  shifted.  This  ability  to  recreate  any  intermediate  configuration 
quickly  in  the  parse  is  central  to  the  editor’s  ability  to  efficiently  and  incrementally  reparse  a 
user’s  modifications  as  they  are  made.  The  ability  to  terminate  the  reparse  after  the 
modification  is  complete,  and  not  completely  reparse  the  remainder  of  the  program  is  also  re- 
quired if  this  approach  is  to  prove  feasible  to  use.  The  incremental  parser  also  has  this  second 
property,  but  we  will  defer  discussion  of  it  until  Chapter  5,  when  the  incremental  parsing  algo- 
rithm is  discussed  in  detail. 
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CHAPTER  4 

PARSE  TREE  STRUCTURE 


In  chapter  3,  construction  of  a parse  tree  was  discussed.  In  this  chapter,  we  will  look  at 
some  possible  parse  tree  structures  and  then  decide  upon  the  one  to  be  used  with  the  incremental 
parsing  algorithm  to  be  presented  in  chapter  5.  To  avoid  implementation  complexity,  it  is  desir- 
able to  have  the  editor  use  the  parse  tree  structure  directly  instead  of  maintaining  both  a parse 
tree  and  equivalent  text  representation  and  then  maintaining  the  consistency  between  them. 
Therefore,  the  parse  tree  must  be  able  to  support  both  the  incremental  parsing  algorithm  and  the 
editor’s  command  interpreter  and  display  module. 

The  parse  tree  proposed  by  [Ghezzi  and  Mandrioli,  80]  is  sufficient  to  support  their  parsing 
algorithm,  but  is  not  suitable  for  use  with  an  editor  since  a number  of  operations  are  required 
which  cannot  be  performed  efficiently  using  their  structure.  In  particular,  the  editor’s  command 
interpreter  requires  the  ability  to  move  from  node  to  node  throughout  the  tree  in  response  to 
user  commands  which  select  token  sequences  and  sub-trees  for  editing.  In  addition,  the  editor’s 
display  module  needs  a convenient  and  efficient  way  to  sequentially  access  the  terminal  nodes  in 
the  tree  to  generate  the  display;  it  is  much  too  inefficient  to  force  a walk  through  the  internal 
structure  of  the  tree  in  order  to  retrieve  these  terminal  nodes. 

We  will  begin  with  a summary  of  common  tree  traversal  methods  for  both  binary  trees  and 
trees.  By  tree,  we  mean  an  ordered  tree  in  which  each  non-terminal  node  has  one  or  more  chil- 
dren in  a particular  order  from  left  to  right,  as  opposed  to  an  oriented  tree  in  which  no  ordering 
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is  imposed  upon  the  children.  By  binary  tree,  we  mean  an  ordered  tree  in  which  each  internal 
node  has  at  most  two  children,  distinguished  as  the  left  child  and  the  right  child.  Then  we  will 
review  the  parse  tree  structure  proposed  by  Ghezzi  and  Mandrioli  and  show  what  access  routines 
their  structures  require  to  support  the  traversals,  and  what  difficulties  arise.  Lastly,  we  propose 
improvements  to  their  linking  structure,  and  show  how  the  modified  linking  structure  better  sup- 
ports the  tree  traversals  and  editor  tree  access. 

4.1.  Traversing  the  Parse  Tree 

Tree  traversal  algorithms  visit  each  node  in  the  tree  in  some  order.  Recursive  or  iterative 
programs  can  easily  be  written  which  visit  each  node  and  its  sub-trees.  Three  common  traversal 
methods  for  binary  trees  are  listed  in  Table  4-1,  headed  by  some  of  the  names  commonly  used  to 
refer  to  them.  These  traversals  assume  that  each  internal  node  in  the  binary  tree  contains 
pointers  to  its  left  and  right  children.  The  editor’s  parse  tree  is  actually  implemented  as  a binary 
tree  by  using  a standard  correspondence  between  trees  and  binary  trees  [Knuth,  73],  Therefore, 
programs  which  need  to  visit  each  node  of  the  tree  and  can  use  a binary  tree  traversal  may  do  so 
if  a simpler  program  results.  However,  the  parse  tree  should  conceptually  be  thought  of  as  a 
(non— binary)  tree  since  each  internal  node  has  one,  two  or  more  children,  and  a given  traversal 
algorithm  will  visit  the  nodes  in  a tree  in  a different  order,  depending  upon  the  way  in  which  the 


preorder 
depth-first  order 

inorder 

symmetric  order 
lexicographic  order 

postorder 
endorder 
bottom  up  order 

visit  the  node 

traverse  the  left  subtree 

traverse  the  right  subtree 

traverse  the  left  subtree 
visit  the  node 
traverse  the  right  subtree 

traverse  the  left  subtree 
traverse  the  right  subtree 
visit  the  node 

Table  4-1:  Some  binary  tree  traversal  methods. 
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tree  is  viewed.  Therefore,  we  will  not  speak  of  the  left  and  right  sons  of  a node,  but  of  the  chil- 
dren of  a node,  and  a node  with  n children  will  have  n subtrees  to  be  visited.  Using  this  tree 
structure,  the  preorder  and  postorder  traversals  may  be  described  as  shown  in  Table  4-2.  There 
is  no  simple  equivalent  for  inorder,  since  the  root  node  needs  to  be  visited  somewhere  in  between 
the  visits  to  the  first  and  last  children. 

4.2.  The  Ghexxi  and  Mandrioli  Parse  Tree 

In  the  Ghezzi  and  Mandrioli  parse  tree,  nodes  are  linked  together  by  four  types  of  links: 
Ithread  (left  thread),  parent,  rmost  (rightmost  sibling)  and  rdescend  (rightmost  descendant). 
These  links  are  all  that  are  required  to  support  an  incremental  parser;  thus  the  leftmost  son  and 
sibling  links,  shown  in  the  parse  trees  presented  in  Chapter  3,  do  not  exist  in  this  tree.  The 
Ithread  link  is  identical  to  the  left  thread  link  previously  described;  it  points  to  the  node  which 
was  shifted  by  the  parser  immediately  preceding  this  one.  Each  node  in  the  tree  points  to  its 
parent  through  its  parent  link,  and  to  the  rightmost  sibling  in  its  production  through  its  rmost 
link.  Lastly,  each  node  points  to  the  terminal  node  at  the  right  end  of  its  sub-tree  through  its 
rdescend  link. 

Unfortunately,  we  have  no  pointers  to  any  of  the  subtrees  to  use  for  the  tree  traversals,  ex- 
cept for  the  single  pointer  to  the  rightmost  descendant.  So  either  a new  method  for  traversal 
must  be  devised  which  uses  only  those  links  which  are  available,  or  the  left  and  right  son  links  of 


preorder 

poetorder 

visit  the  node 
traverse  the  subtrees 

traverse  the  subtrees 
visit  the  node 

Table  4-2:  Equivalent  tree  traversal  methods  for  preorder  and  poetorder  traversals. 
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a binary  tree  must  be  simulated  in  some  fashion.  By  providing  functions  to  return  the  leftson 
and  right  sibling  of  a node,  we  can  implement  the  above  traversals  without  requiring  any  other 
links  in  the  tree. 

4.2.1.  Retrieving  the  Right  Sibling  of  a Node 

With  this  tree  structure,  the  right  sibling  of  a node  may  be  determined  by  accessing  only 
the  rmost  and  Ithread  attributes  in  the  tree.  For  the  nodes  which  are  children  of  a single  parent, 
the  Ithread  attribute  links  each  child  except  the  leftmost  son  to  its  left  brother  (the  leftmost  son 
is  linked  to  the  left  sibling  of  its  closest  ancestor  with  a left  sibling).  Using  these  two  links,  we 
can  construct  a function  which  returns  the  right  sibling  (hereafter  referred  to  simply  as  the  si- 
bling) of  a node,  or  nil  if  the  node  is  itself  a rightmost  sibling.  This  function  is  presented  in  Fig- 
ure  4-1. 


Algorithm  4-1:  Retrieve  the  Right  Sibling  of  a Node 


sibling(M): 

Input:  A parse  tree  node  pointer. 

Output:  A pointer  to  the  sibling  of  the  node,  or  nil  if  none  exists. 
Let  X be  a pointer  to  a parse  tree  node. 


X «—  rmostfM )',  1 

if  X = M then  1 

return(nil)]  0 

while  M Ithread(X)  R 

X = Ithread(X)',  R - 1 

return(X).  1 


Figure  4-1:  Given  a parse  tree  whose  nodes  are  linked  together  only  by  parent,  Ithread, 
rmost,  and  rdescend  links,  retrieve  the  right  sibling  of  a node.  The  column  to  the  right 
counts  the  number  of  times  each  statement  is  executed,  under  the  assumption  that  M is 
not  itself  a rightmost  sibling.  (If  M is  a rightmost  sibling,  then  the  running  time  is  3, 
since  only  the  first  three  lines  are  executed.) 
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4.2.I.I.  Discussion  of  Algorithm  4-1 

Given  any  node  in  a completed  parse  tree,  this  algorithm  will  always  return  the  right  si- 
bling of  a node  if  one  exists,  or  nil  otherwise.  This  can  be  seen  as  follows:  Let  M be  a node  in  the 
parse  tree.  If  SI  is  the  rightmost  child  in  some  production,  then  its  rmoat  attribute  will  have  been 
set  to  itself  when  this  portion  of  the  tree  was  built,  X will  test  equal  to  SI,  and  nil  will  be  re- 
turned. If  SI  is  not  the  rightmost  child  in  some  production,  then  its  rmoat  attribute  will  have 
been  set  to  the  node  which  is  the  rightmost  in  the  production.  The  while  loop  above  will  succeed 
in  locating  the  next  sibling  of  M if  and  only  if  M appears  on  the  Ithread  list  beginning  at  rmoat(SI), 
and  is  located  immediately  after  the  node  which  is  its  next  sibling. 

SI  does  appear  on  this  Ithread  list  because  an  LR  parser  constructs  the  parse  tree  by  a se- 
quence of  operations  which  correspond  to  the  reverse  of  a rightmost  derivation  of  the  terminal 
string  represented  by  the  parse  tree.  In  a rightmost  derivation,  the  rightmost  non-terminal  is  re- 
placed at  each  step.  The  right-sentential  form  produced  in  this  way  can  be  written  as  5 ^ aw, 
where  a consists  of  a mixture  of  both  non— terminal  and  terminal  symbols,  while  w consists  only 
of  terminal  symbols.  As  an  LR  parse  progresses,  a will -correspond  to  the  contents  of  the  parse 
stack,  and  w to  the  unexpended  input  string.  Each  node  on  the  parse  stack  corresponding  to  a 
symbol  in  a will  have  its  Ithread  attribute  set  to  the  node  which  represents  the  symbol  to  its  left 
in  a,  since  the  nodes  in  this  list  are  by  definition  those  on  the  parse  stack. 

In  addition,  if  a = ^0,  and  B — *■  0 is  a production  in  the  grammar,  where  0 is  on  the  top  of 
the  parse  stack,  then  0 is  called  a handle.  Whenever  the  parser  recognizes  a handle  0 on  the 
parse  stack,  it  performs  the  reduction  B — ► 0,  replacing  0 with  the  non-terminal  B.  Because  the 
reverse  of  a rightmost  derivation  is  being  performed,  the  symbols  that  comprise  0 are  exactly  the 
immediate  children  of  the  production  B — ► 0.  Therefore,  the  first  n nodes  at  the  top  of  the  parse 
stack,  where  n = I /i  I , will  be  the  nodes  corresponding  to  0,  with  the  rightmost  child  on  top. 
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Since  the  Ithread  attribute  of  the  first  n - 1 nodes  at  the  top  of  the  parse  stack  is  set  to  their  left 
siblings,  and  the  parse  stack  is  finite  in  length,  by  following  the  Ithread  links,  eventually  a node  X 
must  be  found  whose  Ithread  attribute  is  M,  and  the  algorithm  will  successfully  terminate. 

4.2.1.2.  Running  time  of  Algorithm  4-1 

When  U is  the  rightmost  sibling,  only  the  first  three  lines  of  the  algorithm  are  executed,  so 
the  running  time  of  the  above  algorithm  is  3,  or  O(constant).  Otherwise,  the  running  time  is 
given  by  the  sum  of  the  counts  shown  to  the  right  in  Figure  4—1,  which  is  2R  + 2 , where 
R = | (3 1 - 1,  B — *■  IS  being  the  production  for  which  M represents  a symbol  contained  in  0.  This 
is  also  O(constant),  being  on  average  the  mean  of  the  lengths  of  the  production  rules  represented 
in  the  tree,  which  is  independent  of  the  number  of  nodes  in  the  tree. 

4.2.2.  Retrieving  the  Leftmost  Son  of  a Node 

Construction  of  an  algorithm  to  retrieve  the  leftmost  son  of  a node  using  some  combination 
of  these  four  links  is  slightly  more  complicated,  and  unfortunately  will  not  run  on  average  in 
O(constant)  time,  as  the  sibling  function  does.  This  algorithm  will  make  the  above  tree  traver- 
sals easier  to  code,  although  its  overall  running  time  may  not  be  very  desirable,  as  we  shall  see. 
This  algorithm  is  presented  in  Figure  4-2. 

4.2.2. 1.  Discussion  of  Algorithm  4—2 

This  algorithm  works  by  following  the  chain  of  parent  pointers  back  up  from  the  rightmost 
descendant  of  a node  until  the  node’s  rightmost  (immediate)  child  is  reached,  and  then  by  follow- 
ing the  left  thread  links  through  the  list  of  children  until  the  leftmost  child  is  reached.  Each  time 
that  the  parser  performs  a reduction  of  nodes  X1...Xn  to  a parent  node  M,  it  makes  the  following 

assignments  (among  others): 
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Algorithm  4-2:  Retrieve  the  Left  Son  of  a Node 

leftson(M): 

Input:  A pointer  to  a parse  tree  node. 

Output:  A pointer  to  the  left  son  of  the  node,  or  nil  if  none  exists  (M  is  a ter  mi' 
nal  node). 

Let X be  a local  variable  which  is  a pointer  to  a parse  tree  node. 


X «—  rdescend(^Z);  1 

if  X = X then  1 

return(nil)]  0 

while  parent(.X)  M do  H 

X 4—  parent(Jf);  H - 1 

while  parent(lthread(X))  = M do  R 

X-e-  lthread(X);  R-l 

return(X).  1 


Figure  4-2:  Given  a parse  tree  whose  nodes  are  linked  together  only  by  parent,  Ithread, 
rmost,  and  r descend  links,  retrieve  the  leftmost  son  of  a node.  The  column  to  the  right 
counts  the  number  of  times  each  statement  is  executed,  under  the  assumption  that  M is 
not  a terminal  node.  (If  X is  a terminal,  then  the  running  time  is  3,  since  only  the  first 
three  lines  are  executed.) 


parentfXjJ  4 — addr(M),  1 < k < n;  (1) 

rdescend(H)  +—  rdescend(Xj ; 

which  extends  the  chain  of  parent  links  along  the  right  edge  of  the  tree  by  one  level.  The  first 
while  loop  can  be  shown  correct  by  induction  on  the  height  h of  the  right  side  of  the  sub-tree 
whose  root  is  M.  For  h = 0,  M is  a terminal  node,  and  its  rdescend  attribute  points  to  itself.  F or 
h — 1,  the  rdescend  attribute  of  a non-terminal  node  points  to  its  rightmost  child,  which  must 
be  a terminal  node.  Likewise,  the  parent  attribute  of  the  rightmost  child  terminal  node  points 
back  to  this  non-terminal  node,  since  it  is  its  immediate  parent.  For  h = 2,  n,  assuming  the 
rdescend  attribute  of  M is  the  rightmost  terminal  node  in  the  parse  tree,  and  the  parent  attribute 
of  the  rightmost  child  of  M points  to  then  for  h — n + 1,  the  rdescend  attribute  of  M is  identi- 
cal to  that  of  its  rightmost  child  since  it  is  copied  from  the  rdescend  attribute  of  its  rightmost 
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child,  and  the  parent  of  the  rightmost  child  of  X is  X itself,  as  given  in  code  fragment  (1)  above. 
Therefore,  the  rightmost  descendant  attribute  of  all  rightmost  non-terminals  in  the  sub-tree 
with  root  node  X point  to  the  rightmost  terminal  in  the  sub-tree,  and  X can  always  be  reached 
by  following  parent  pointers  beginning  at  rdescend(R).  Since  the  first  while  loop  in  algorithm  4-2 
follows  parent  pointers  beginning  at  the  rightmost  descendant  of  X,  it  will  always  arrive  at  a 
node  X whose  parent  attribute  is  set  to  X,  and  this  node  will  be  the  rightmost  child  of  X.  There- 
fore, at  the  conclusion  of  the  first  while  loop,  X will  be  set  to  the  rightmost  child  of  X. 

From  the  earlier  discussion  of  algorithm  4—1,  we  know  that  if  we  start  at  the  rightmost 
child  X of  X and  follow  the  pointers  stored  in  the  Ithread  attribute,  that  we  reach  the  children  of 
the  production  whose  left  hand  side  symbol  is  represented  by  the  parent  of  this  node,  that  we 
reach  these  children  in  right-to-left  order,  and  that  the  parent  attribute  of  each  of  these  children 
is  set  to  X.  Therefore,  by  looking  one  node  deeper  into  the  stack  than  X,  that  is,  to  node 
Ithread(X),  if  we  find  that  the  parent  attribute  of  this  node  is  not  set  to  X then  we  know  that  Xis 
the  first  and  leftmost  son  of  X.  Thus  the  second  while  loop  will  always  terminate  with  X set  to 
the  left(most)  son  of  X,  and  Xis  the  value  which  is  returned. 

4.2.2.2.  Running  Time  of  Algorithm  4-2 


If  X is  a non-terminal  node,  the  running  time  of  lefts  on  is  2H  + 2R  + 1,  where  H is  the 
height  of  the  right  side  of  subtree  X,  and  R is  the  length  of  the  right  hand  side  of  the  production 
rule  whose  left  hand  side  non-terminal  is  X.  (If  Xis  a terminal  node,  then  the  running  time  is  3.) 
Since  R depends  on  the.  length  of  the  production  rule  and  not  the  number  of  nodes  in  the  parse 
tree,  it  is  of  O(constant).  However,  H depends  on  the  height  of  the  right  side  of  the  sub-tree  of 


which  it  is  the  root,  which  does  depend  upon  the  size  of  the  tree. 

The  best  case  for  H occurs  when  the  production  with  parent^Adias  a rightmost  child  which 
is  a terminal  node;  in  this  case  H = 1,  and  the  overall  algorithm  becomes  O(constant).  It  is 
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necessary  to  look  at  the  grammar  to  determine  the  likelihood  of  this  case,  by  noting  the  number 
of  productions  which  end  in  a terminal  node,  and  their  relative  frequency  in  the  language.1 

The  worst  case  for  H occurs  when  all  productions  in  the  sub-tree  with  root  R have  length  1; 
in  this  case  H = n - 1,  where  n is  the  number  of  nodes  in  the  tree,  and  H is  O(n).  For  languages 
specified  with  non— ambiguous  BNF  grammars,  the  grammar  will  contain  a number  of  renaming 
rules,  many  in  high  frequency  use,  such  as:. 

< expression  > =>  <term>  =>  < factor  > =>  < variable  > =£>  < identifier  >. 

We  would  like  to  discover  what  the  average  value  is  for  a given  grammar.  Unfortunately, 
in  general  this  is  a difficult  question  to  answer.  Empirical  estimates  can  be  made  by  analyzing 
collections  of  programs  written  in  the  language,  and  computing  the  mean  and  standard  deviation 
of  H and  R.  However,  the  choice  of  programs  to  include  in  the  study  must  be  carefully  made,  to 
arrive  at  a representative  sample. 

If  the  grammar  follows  a regular  pattern,  it  might  lend  itself  to  a more  mathematical 
analysis.  Consider  for  example  a grammar  in  which  every  production  is  of  length  2.  This  gram- 
mar produces  binary  parse  trees  ( R = 2).  The  variable  H measures  the  external  path  length 
along  the  right  edge  of  the  parse  tree.  If  we  assume  that  the  external  path  length  along  the  right 
edge  of  the  parse  tree  is  no  different  than  the  external  path  length  from  the  root  to  any  of  the 
terminal  nodes,  then  we  can  take  H to  be  proportional  to  the  average  external  path  length  of 
these  trees.  In  this  case,  given  a parse  tree  with  n internal  nodes,  if  we  further  assume  sentences 
will  produce  complete  binary  trees  (trees  with  minimum  height),  we  get  an  average  path  length 
of  logtn.  If  instead  we  assume  that  all  possible  tree  constructions  with  n nodes  are  equally  likely, 


'For  example,  the  C language  grammar  in  use  on  the  SAGA  system  contains  288  production  rules,  of 
these,  100  have  a terminal  node  as  the  rightmost  child. 
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we  get  an  average  path  length  of  sqrt(n).2  Unfortunately,  grammars  for  real  programming 
languages  are  unlikely  to  produce  trees  which  closely  follow  either  form. 

Since  LR(1)  parsers  operate  by  recognizing  production  rule  handles  on  the  parse  stack  and 
reducing  them  to  single  node  parents,  it  seems  reasonable  to  expect  parse  trees  to  take  on  an 
overall  shape  more  similar  to  a complete  tree  than  to  either  a degenerate  (linear)  tree  or  many  of 
the  structures  possible  given  n nodes  (although  it  is  certainly  possible  to  construct  a grammar 
with  either  of  these  properties).  Since  a complete  tree  seems  overly  optimistic,  a value  of  loggn 

for  H is  likely  to  be  a good  lower  bound. 

As  the  value  of  R increases,  where  R is  based  on  a weighted  (frequency)  count  of  the  lengths 
of  the  production  rules  in  a grammar,  we  can  expect  the  average  parse  tree  height  H to  decrease 
for  a given  value  of  n,  the  number  of  nodes  in  the  tree.  We  can  hypothesize  that  H and  R are  in- 
versely related  to  one  another,  and  that  a small  value  of  R implies  a tree  with  generally  longer 
external  path  lengths  (an  “overhead”  factor  of  non-terminals,  in  a sense).  Table  4-3  presents 
some  measurements  of  R,  both  as  a simple  average  of  the  (unweighted)  productions  in  several 
grammars,  as  well  as  an  average  based  on  the  frequency  of  productions  found  in  a set  of  pro- 
grams  written  in  the  language. 

4.3.  Providing  leftson  and  right  sibling  Attributes. 

The  four  attributes  for  parse  tree  linking  are  the  minimum  necessary  to  support  the  incre- 
mental parser.  The  editor’s  incremental  parser  never  needs  to  determine  either  the  leftson  or  the 
sibling  of  a node  since  the  parse  tree  is  built  bottom  up.  However,  the  editor  and  other  routines 
which  need  to  perform  traversals  of  a completed  parse  tree  will  need  to  determine  this  informa- 
tion. An  implementor  of  the  parsing  algorithm  must  determine  how  often  leftson  and  sibling 


2[Knuth,  73],  section  2.3.4.5  (p.  400). 
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Language 

Number 

of 

Production 

Rules 

Average 
Production 
Length  R 
Grammar 

Average 
Production 
Length  R 
in  Programs 

Ratio  of 
Non-terminals 
to 

Terminals 

Pascal 

217 

2.12 

1 

FP 

105 

1.35 

1 

Ada 

432 

2.48 

1.41 

■ 

C 

288 

. 1.94 

1.14 

Table  4-3:  Average  production  rule  lengths  for  several  SAGA  grammars  and  sets  of  programs 
produced  by  those  grammars. 


functions  are  likely  to  be  used.  The  cost  of  adding  each  attribute  is  the  space  required  to  store 
another  link  in  a parse  tree  node,  and  the  code  to  maintain  it.  The  benefit  for  the  leftaon  access 
is  that  an  operation  which  previously  took  somewhere  between  0(logfn ) and  0(n ) running  time 
will  take  unit  time,  or  0(1 ),  since  it  is  a simple  lookup.  The  benefit  for  the  sibling  access  is  that 
an  operation  which  formerly  took  0 (constant ) time  (a  value  of  6 time  units  for  R = 2)  will  also 
take  unit  time  (1  time  unit). 

Since  much  longer  running  times  for  the  traversal  programs  would  be  required  if  the  leftaon 
and  sibling  functions  are  used,  both  a leftaon  and  a (right)  sibling  attribute  have  been  added  to 
the  parse  tree.  Due  to  the  similarity  between  the  sibling  attribute  and  the  rmoat  attribute,  the 
rmoat  attribute  has  been  removed,  since  it  can  be  calculated  by  following  a sequence  of  aibling 
pointers.  As  we  will  see  shortly,  the  incremental  parsing  algorithm  only  accesses  the  rmoat  attri- 
bute during  a late  phase  of  the  reparse,  when  it  is  reparsing  a section  of  the  tree  previously  en- 
tered, and  then  only  to  avoid  shifting  nodes  in  a production  which  already  have  their  Ithread 
fields  properly  set.  By  eliminating  the  rmoat  attribute,  we  now  need  to  follow  a pointer  chain  to 
reach  the  rightmost  sibling,  but  no  other  work  needs  to  be  done  (the  nodes  do  not  need  to  have 
the  shift  operation  performed  again  by  the  parser,  since  their  links  are  already  correctly  set).  Be- 
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cause  the  average  production  lengths  are  small,  this  pointer  chain  will  be  short,  and  little  addi- 
tional work  is  actually  required. 

4.4.  Linking  the  Terminal  Nodes  Together 

The  editor  maintains  a text  image  display  of  the  tree  (hiding  the  internal  structure),  so  it  is 
necessary  to  be  able  to  efficiently  access  successive  terminal  nodes  to  retrieve  the  text  representa- 
tion of  the  token  represented  by  each  of  these  nodes.  The  initial  version  of  the  editor  used  inher- 
ited attributes  for  the  text  formatting  information  and  a preorder  tree  traversal  to  produce  the 
display,  but  this  scheme  had  two  difficulties.  First,  the  input  had  to  be  successfully  parsed  in 
order  to  meaningfully  generate  the  display,  and  second,  the  computation  time  needed  to  traverse 
the  tree  was  excessive.  By  chaining  the  terminal  nodes  together  into  a doubly  linked  list,  and 
only  processing  this  list  in  order  to  produce  the  display,  better  response  was  achieved;  in  addi- 
tion, by  storing  newline  and  spacing  information  in  the  terminal  nodes,  the  user  s format  could 
be  preserved  and  the  display  produced  whether  or  not  the  input  could  be  successfully  parsed.3  (A 
pretty  printer  could  still  be  invoked  at  the  conclusion  of  the  parse  to  reformat  the  display,  if 
desired.) 

Therefore,  two  additional  attributes  prev  and  next  (discussed  further  in  Chapter  5)  were  ad- 
ded to  the  parse  tree  node.  With  these  additional  links,  the  parse  tree  structure  will  serve  very 
well  to  support  the  functionality  required  by  the  editor. 


3A  further  improvement  in  response  time  was  achieved  by  physically  grouping  terminal  and  non- 
terminal nodes  together  in  memory.  Thu  was  accomplished  by  rewriting  the  node  allocation  routine  to  al- 
locate both  a block  of  non-terminal  nodes  and  a block  of  terminal'n£des  (it  still  passed  them  back  one  at  a 
time,  as  new  nodes  were  needed).  The  editor  runs  on  an  operating  system  with  demand  paged  memory, 
and  its  parse  tree  can  also  be  demand  paged  within  the  editor’s  own  data  buffers  (discussed  in  Chapter  6). 
This  grouping  resulted  in  fewer  page  faults  when  a display  was  generated,  and  improved  response  on  a 
multi-user  system  since  a smaller  working  set  of  memory  pages  was  sufif^ent  to  support  the  editor. 
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4.5.  Summary 

In  this  chapter,  we  have  taken  a look  at  the  parse  tree  access  required  by  an  editor,  investi- 
gated some  possible  parse  tree  structures,  and  chosen  a structure  which  is  adequate  to  efficiently 
support  the  editor.  The  net  result  is  that  the  leftaon,  prev  and  next  attributes  have  been  added, 
and  the  rmoat  attribute  replaced  by  the  sibling  attribute.  In  the  next  chapter,  the  incremental 
reparsing  algorithm  is  introduced  and  extended.  More  will  be  said  about  these  modifications  at 
that  point,  when  their  impact  upon  the  algorithm  is  discussed.  All  of  the  extensions  which  were 
made  to  the  algorithm  can  still  be  made  independently  of  the  changes  made  to  the  attributes  of  a 
node,  so  another  implementation  could  be  based  on  the  original  parse  tree  structure  together 
with  the  leftaon  and  sibling  routines  defined  in  this  chapter.  We  now  turn  our  attention  to  the  in- 
cremental LR  parsing  algorithm,  discussed  in  Chapter  5. 
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CHAPTER  5 

THE  INCREMENTAL  PARSER 


In  this  chapter,  the  editor’s  incremental  parser  is  presented.  This  parser  is  based  on,  and 
extended  from,  an  incremental  LR(0)  parsing  algorithm  by  Ghezzi  and  Mandrioli  [Ghezzi  and 
Mandrioli,  80].  As  published,  the  algorithm  assumes  the  use  of  parser  tables  produced  by  an 
LR(0)  parser-generator.  The  input  grammar  is  assumed  to  be  an  LR(0)  context-free  grammar 
excluding  productions  with  empty  right  hand  sides. 

To  adapt  the  algorithm  for  use  with  the  SAGA  editor,  a number  of  extensions  were  made. 
Since  many  programming  languages  are  based  on  LALR(l)  grammars,  the  algorithm  has  been  ex- 
tended from  LR(0)  to  LR(1)  (also  handling  LALR(l)  and  SLR(l)  grammars).  It  also  has  been  ex- 
tended to  support  grammars  containing  productions  with  empty  right  hand  sides. 

We  have  a new  way  to  handle  comments,  which  eliminates  several  problems:  (1)  Restric- 
tion of  use  of  comments  to  limited  placement  in  the  language,  necessary  for  syntax-directed  tem- 
plate editors.  (2)  Storage  and  maintenance  of  comments  in  the  parse  tree.  It  is  a difficult  prob- 
lem to  store  them  in  the  tree  and  display  them  for  the  user  while  hiding  them  from  the  routines 
which  analyze  the  syntax  of  the  tree.  (3)  Uniformity  of  access  by  editor  commands  to  both  com- 
ments and  syntactically  meaningful  tokens  in  the  tree.  Comments  have  been  attached  to  other 
tokens,  not  always  displayed  automatically,  and  have  required  additional  commands  specifically 

designed  to  enable  them  to  be  edited.  We  have  defined  a comment  as  a lexical  class,  and 

; 

modified  the  parsing  algorithm  to  recognize  comments  and  handle  them  in  an  appropriate 
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manner. 

We  have  redefined  the  reduce  operation,  proposing  an  alternative  which  permits  the  parser 
to  treat  non-terminal  tokens  in  a uniform  manner  with  terminal  tokens.  We  also  have  combined 
the  parsing  action  and  goto  function  into  a single  action.  Both  of  these  modifications  eliminate 
duplicate  code  in  the  incremental  parser,  improve  its  efficiency,  and  permit  the  editor  to  pass 
sub-trees  to  the  parser,  as  well  as  lists  of  terminal  tokens. 

Explicit  error  handling  actions  have  been  introduced,  since  a working  editor  must  be  able 
to  recover  from  a user’s  syntax  errors.  The  original  algorithm  identifies  syntax  errors,  but  states 
only  “Jump  to  the  appropriate  error  recovery  action.”  While  there  are  several  approaches  to  be 
taken  and  a decision  to  be  made  about  whether  to  provide  automatic  error  correction,  the  choice 
of  the  best  approach  for  use  in  an  interactive,  incrementally  parsing  editor  is  not  obvious.  We 
have  in  fact  tried  a couple  different  approaches  toward  the  handling  of  errors  before  settling  upon 
the  current  scheme,  described  below  in  section  5.9,  as  the  most  suitable. 

We  have  altered  the  attributes  associated  with  the  parse  tree  node  proposed  in  the  original 
statement  of  the  incremental  parser,  since  that  structure  is  not  suitable  for  use  with  an  editor. 
The  alteration  of  one  attribute  and  the  introduction  of  some  additional  ones  enables  the  editor’s 
command  and  display  modules  to  work  directly  from  the  parse  tree.  This  eliminates  the  need  to 
keep  an  additional  text  representation  and  the  associated  additional  complexity  that  would  be  re- 
quired to  maintain  the  consistency  between  the  textual  and  structural  forms  of  the  data. 

A parse  tree  was  chosen  as  the  data  structure,  instead  of  an  abstract  syntax  tree,  since  it 
enables  both  the  editor  and  the  display  manager  to  work  directly  with  the  tree.  While  abstract 
trees  require  less  space,  systems  which  use  them  require  an  unparser  to  reconstruct  the  original 
text  image  for  display,  and  a second  data  structure  to  retain  this  text  image  for  the  editor.  (In 
the  SAGA  editor,  a text  image  of  the  data  actually  displayed  on  the  screen  is  kept  by  the  window 
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management  package,  but  is  used  only  to  optimize  updates  to  the  display).  With  abstract  trees, 
the  formatting  of  the  displayed  text  typically  is  limited,  and  in  some  cases  no  choice  is  permitted 
the  user.  But  with  a parse  tree,  the  user  can  format  his  program  any  way  he  pleases,  and  this  in- 
formation is  retained.  Pretty-printing  programs  also  can  be  used  to  reformat  all  or  any  part  of 
the  tree  in  some  standardized  or  customized  format. 

In  this  chapter,  we  describe  and  present  the  original  Ghezzi  and  Mandrioli  incremental 
parser,  and  introduce  some  additional  variables  which  will  help  in  the  subsequent  description  of 
our  extensions.  Then  we  review  the  extensions  that  we  made  in  Chapter  4 to  the  parse  tree 
structure,  discuss  their  impact  upon  the  parsing  algorithm,  and  present  the  modifications  re- 
quired to  support  them.  Remaining  sections  introduce  the  extensions  to  the  algorithm  required 
to  support  the  editor  capabilities  mentioned  above.  Finally,  we  restate  the  algorithm  at  the  end 
of  the  chapter,  with  all  of  the  extensions  incorporated  into  it.  In  Chapter  6,  we  turn  our  atten- 
tion to  the  editor  itself,  show  how  the  incremental  parser  is  interfaced  to  the  editor,  and  discuss 
the  fundamental  command  capabilities  which  provide  support  for  both  text  and  structure  com- 
mands. 

5.1.  Description  of  the  Algorithm 

The  original  Ghezzi  and  Mandrioli  parsing  algorithm  is  described  in  this  section  to  give  the 
reader  a feel  for  the  operations  that  occur  during  incremental  reparsing.  The  algorithm  itself  is 
presented  in  the  next  section.  Following  sections  then  introduce  the  extensions  that  have  been 
made.  The  resulting  algorithm,  used  by  the  SAGA  editor,  is  presented  at  the  end  of  this  chapter, 
summarizing  the  extensions  made.  *• 

Given  a grammar  G = (N,  E,  P,  S),  a terminal  string  w = xzy  6 L(G),  with  x,  z,  y £ E*, 

and  a parse  tree  Tfor  w,  we  wish  to  substitute  the  string  of  tokens  z’  6 E*  for  z in  T,  incremen- 

* 
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tally  reconstructing  the  parse  tree  T*  for  the  new  string  w’  = zz’y . (Note  that  any  of  x,  y,  z , and 
z’  may  be  empty,  in  which  case  we  may  have  only  an  insertion  or  deletion,  or  an  initial  parse.) 
To  aid  in  the  description  of  the  algorithm,  several  variables  not  present  in  the  original  algorithm 
are  introduced.  We  introduce  variables  activenode,  deletecount,  nextusernode  and  nextnode . Ac - 
tivenode  points  to  the  terminal  node  at  which  an  editing  cursor  is  positioned;  all  deletions  will  be- 
gin with  this  node,  and  all  insertions  will  occur  just  before  it.  Thus  activenode  is  positioned  on 
the  node  representing  the  first  token  in  zy.  Activenode  is  passed  to  the  parser  together  with 
deletecount , which  is  assigned  the  number  of  nodes  to  be  deleted  (the  length  of  z).  Nextusernode 
is  set  to  the  first  symbol  in  the  input  string  z’  to  be  read  by  the  parser,  or  nil  if  z}  is  empty. 
Nextnode  is  set  to  the  node  corresponding  to  the  first  token  in  y.  This  variable  is  initially  as- 
signed a pointer  to  the  node  deletecount  nodes  past  activenode . As  the  parser  reads  the  nodes 
corresponding  to  the  tokens  in  y,  this  variable  will  be  advanced,  so  it  always  marks  the  next  node 
to  be  read. 

Variable  stacktop , corresponding  to  top  in  the  original  algorithm,  is  set  to  the  node  on  the 
top  of  the  parse  stack,  and  irmark,  corresponding  to  mark,  to  the  node  on  the  parse  stack  just  to 
the  left  of  the  first  node  included  in  the  new  sub-tree  being  constructed  by  the  parser.  The  tr- 
mark  variable  will  be  used  later  to  terminate  the  reparse;  it  will  be  discussed  at  that  time. 

5*1.1'  Initialization 

Let  Mbe  the  node  at  which  the  editing  cursor  is  positioned  (and  at  which  activenode  is  set). 
To  perform  the  initialization,  we  must  restore  the  state  of  the  parser  to  that  which  existed  just 
before  the  first  token  of  zy  was  shifted  during  the  previous  parse.  Since  each  and  every  state 
through  which  the  parser  has  passed  during  the  previous  parse  of  Tis  stored  in  the  Ithread  and 
pstate  attributes  associated  with  the  nodes  in  T,  the  parse  stack  can  be  restored  simply  by  setting 
stacktop  to  the  value  of  Ithread(M).  The  state  of  the  parser  is  given  by  the  value  of 
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pstate(stacktop).  The  reader  should  be  convinced  that  the  view  of  all  of  the  sub-trees  available 
from  the  stacktop  variable  is  identical  to  the  view  that  the  parser  would  have  obtained  had  it  ac- 
tually restarted  the  parse  from  the  beginning  of  w and  proceeded  up  to  this  point. 

The  first  time  that  the  parser  runs,  there  is  no  initial  tree;  in  this  case,  stacktop  is  set  to  a 
bottom-of-stack  node  B,  a special  parse  node  whose  pstate  attribute  is  set  to  the  initial  state  of 
the  parser.  This  node  serves  as  a pad  token  at  the  bottom  of  the  parse  stack,  and  can  only  be 
reached  through  the  Ithread  links  in  the  tree. 

In  addition,  the  variable  irmark  is  set  to  the  value  of  stacktop,  since  this  node  is  part  of  the 
old  tree,  and  the  next  node  to  be  shifted  will  be  a part  of  the  new  tree  being  constructed.  Final- 
ly, the  input  characters  are  lexically  analyzed  by  a tokenizing  routine,  which  constructs  a linked 
list  of  terminal  nodes  corresponding  to  the  tokens  in  z’,  and  assigns  the  first  node  on  this  list  to 
nextusernode. 

5.1.2.  Deletion  of  z 

If  z is  non-empty,  it  indicates  a group  of  tokens  to  be  deleted.  The  deletion  of  z is  accom- 
plished by  advancing  the  editing  cursor  by  the  number  of  terminal  tokens  to  be  deleted,  and  set- 
ting the  nextnode  variable  to  the  new  node  to  which  the  editing  cursor  now  points;  this  node 
corresponds  to  the  first  token  of  y.  Since  the  terminal  nodes  in  the  parse  tree  corresponding  to  z 
lie  between  the  points  marked  by  stacktop  and  nextnode,  they  will  be  excluded  from  the  new 
parse  tree  that  will  be  constructed  during  this  reparse;  no  further  action  is  required.1 


lIt  is  desirable  to  place  these  nodes  onto  a list  of  deleted  nodes,  or  onto  a “last  nodes  deleted”  list,  to 
support  a capability  for  an  undo  operation.  Of  course,  a garbage  collector  also  could  be  provided  to 
periodically  sweep  through  memory  and  reclaim  those  nodes  no  longer  reachable  from  the  root  of  the  parse 
tree. 
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5.1.3.  Insertion  of  z’ 

If  z’  is  empty,  there  is  no  new  input,  and  we  immediately  skip  to  the  reparsing  of  y,  dis- 
cussed in  the  following  section.  Otherwise,  nextusernode  points  to  the  list  of  terminal  nodes 
corresponding  to  the  tokens  in  z’  to  be  inserted,  and  we  proceed  as  follows.  The  parsing  action  / 
is  determined  using  the  current  parse  state  patate(stacktop).  If  it  is  shift,  then  the  node  pointed 
to  by  nextusernode  is  pushed  onto  the  parse  stack,  and  nextusernode  is  advanced  to  the  next  node 
in  the  list. 

If  / is  reduce,  then  a new  node  is  allocated  to  be  the  parent  of  the  nodes  on  the  top  of  the 
stack  which  correspond  to  the  right  hand  side  of  the  production  rule;  these  nodes  are  popped 
from  the  stack.  The  parse  state  stored  in  the  node  which  becomes  the  top  of  stack  is  used  to 
determine  the  goto  function  g.  The  parent  node  is  pushed  onto  the  stack,  and  assigned  this  state, 

which  becomes  the  new  state  of  the  parser.2 

If  the  string  y in  the  old  tree  is  empty,  or  the  parser  reaches  its  end,  then  it  is  also  possible 
for  the  parsing  action  / to  be  accept,  in  which  case  the  parent  node  corresponds  to  the  start  sym- 
bol of  the  grammar,  the  indicated  reduction  is  performed,  the  parser  is  placed  into  its  final  state, 
and  terminates. 

If  the  next  input  token  is  invalid  in  the  current  parser  context,  then  the  parse  action  is  er- 
rorr and  the  parse  is  suspended  at  this  point.  Discussion  of  error  handling  is  deferred  until  later 
in  the  chapter. 

Assuming  that  w’  = xz’y  is  a legal  sentence  in  the  language,  eventually  the  parser  will  shift 
the  last  token  in  z’,  perform  zero  or  more  reductions,  and  then  be  ready  to  shift  the  first  token  in 


2The  alternate  reduction  mentioned  earlier  in  Chapter  3,  in  which  the  new  parent  is  prepended  to  the 
input  stream,  will  be  added  as  an  extension  to  this  algorithm  later  in  this  chapter. 
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y.  At  this  point  in  the  parse,  the  parsing  of  z’is  complete,  and  we  begin  the  reparse  of  y. 

5.1.4.  Reparse  of  y 

While  re-scanning  y,  the  parser  handles  each  of  the  possible  parse  actions  as  above,  except 
that  it  makes  some  additional  checks  in  an  attempt  to  optimize  the  reparse,  which  will  save  a 
considerable  amount  of  work.  Each  time  that  the  parser  shifts  a node  M corresponding  to  either 
a token  in  y or  the  parent  of  a token  in  y,  it  compares  the  new  parse  state  g(M)  against  the  parse 
state  pstate(M)  the  previous  time  that  M was  shifted.  Since  we  are  re-scanning  y,  which  by  defin- 
ition did  not  change,  if  the  comparison  between  these  parse  states  shows  the  states  to  be  equal, 
then  we  are  guaranteed  that  if  the  parser  continues  reparsing  the  subtrees  of  this  node’s  siblings, 
that  the  exact  same  result  will  be  achieved  as  before.  Therefore,  the  parser  can  skip  these  steps 
of  the  analysis,  and  directly  reset  its  stacktop  variable  to  be  the  rightmost  sibling  of  M (note  that 
the  Ithread  attribute  of  the  right  siblings  of  M are  all  already  correctly  set,  so  that  this  reassign- 
ment causes  them  to  appear  on  the  parse  stack  just  as  though  each  had  been  individually  shifted 
again). 

The  parser’s  next  action  must  be  to  perform  a reduction.  Let  M now  be  set  to  the  parent  of 
the  node  on  the  top  of  the  stack.  If  the  parent  attribute  of  each  of  the  children  to  be  included  in 
the  reduction  is  set  to  point  to  M,  then  this  node  can  also  be  reused.  The  parser  only  needs  to 
reset  the  stacktop  variable  to  be  this  node  (the  Ithread , rmost,  and  parent  links  of  all  of  its  chil- 
dren are  already  correctly  set),  and  set  the  Ithread  attribute  of  M to  the  node  which  is  on  the  top 
of  the  stack  once  the  nodes  corresponding  to  the  right  hand  side  of  this  production  rule  have  been 
popped  from  the  stack. 

The  parser  then  repeats  this  shift/reduce  process,  comparing  the  new  parse  state  to  the  one 
stored,  and  continuing  to  skip  steps  in  the  analysis,  until  it  reaches  a reduction  in  which  all  of  the 
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children  do  not  have  their  parent  attribute  set  to  M. 

At  this  point,  the  match  condition  can  be  tested,  since  some  of  children  correspond  to  ele- 
ments of,  or  parents  of  elements  of  z and/or  z\  If  a match  is  indicated,  then  the  incremental 
reparse  will  terminate  with  this  reduction,  and  our  new  tree  T*  will  be  complete.  If  not,  a new 
non-terminal  node  will  be  allocated,  the  reduction  performed,  and  neztusernode  set  to  the  token 
following  the  rightmost  descendant  of  this  new  sub-tree.  When  the  node  to  which  neztusernode 
now  points  is  shifted,  the  above  parse  state  comparison  can  be  performed,  and  the  above  pro- 
cedure repeated,  until  either  the  match  condition  does  test  true,  or  a reduction  is  made  to  the 
start  symbol  of  the  grammar,  and  the  parser  accepts  the  string. 

5.1.5.  Testing  the  Match  Condition 

Whenever  the  parser  is  ready  to  make  a reduction  while  reparsing  y,  it  checks  a set  of 
matching  conditions  to  determine  whether  the  parent  of  the  node  on  the  top  of  the  stack  can  be 
reused  in  the  reduction  that  is  about  to  be  performed,  and  whether  the  left  and  right  edges  of  the 
resulting  sub-tree  mesh  cleanly  into  the  structure  of  the  old  tree.  If  these  requirements  are 
satisfied,  then  we  are  guaranteed  that  if  we  do  continue  the  parse  beyond  this  point,  all  subse- 
quent actions  of  the  parser  would  be  identical  to  those  that  were  taken  when  the  remainder  of  the 
parse  tree  was  last  constructed.  Therefore,  we  can  terminate  the  parse  at  this  point,  having  in- 
crementally produced  the  treeT’  corresponding  to  the  sentence  u;  = xz’y. 

Let  A — ► ot  be  the  reduction  that  is  about  to  be  performed,  let  stacktop  be  set  to  the  node 
corresponding  to  the  rightmost  token  in  cv,  and  let  U be  the  parent  of  this  node,  or  nil  if  none  yet 
exists  (the  sub-tree  is  new).  Also,  let  a = where  n = I o I and  is  the  node 

corresponding  to  a^,  for  1 < k < n.  Whether  the  matching  condition  holds  at  a given  point  in  the 
parse  tree  can  be  determined  by  performing  the  following  five  tests  prior  to  carrying  out  the 
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reduction: 

(ml)  M 7^  nil  and  token(M)  = A; 

(m2)  irmark  = Mk,  for  some  h, 

(m3)  parent(Mj  = M for  1 < i < k] 

(m4)  parent(lthread(M J)  ^ M; 

(m5)  rdescend(Mj  = rdescend(M). 

Condition  (ml)  checks  whether  the  token  that  labels  the  parent  node  M in  the  original  tree 
Tis  the  same  as  the  token  A on  the  left  hand  side  of  the  production  rule  about  to  be  used.  Clear- 
ly, if  JV  = nil , or  if  these  non-terminal  tokens  do  not  match,  then  the  new  sub-tree  about  to  be 
produced  cannot  reuse  the  node  M,  and  the  match  condition  cannot  be  satisfied. 

Condition  (m2)  checks  that  irmark  points  to  a node  1 < k < n,  which  is  to  be  included 
in  the  reduction.  Recall  that  irmark  is  always  set  to  the  node  closest  to  the  top  of  the  stack 
which  existed  in  the  original  tree  T,  and  has  not  yet  been  included  in  a new  reduction.  If  irmark 
= Mk,  then  we  know  that  Mk  existed  in  the  original  tree  T,  and  that  the  newly  created  nodes 
which  are  descendants  of  the  N.,  1 <j  < n mesh  correctly  into  the  preexisting  nodes  in  T to  their 
left.  It  only  remains  to  be  shown  that  the  parent  node  M can  be  cleanly  grafted  into  T,  and  that 
the  right  edge  of  this  new  sub-tree  fits  into  that  portion  of  T to  its  right. 

Nodes  Ht  ...  Mk  existed  originally  in  T.  Condition  (m3)  checks  that  the  parent  of  each  of 
these  nodes  is  JY.  The  sub-trees  with  parent  Mj  for  j > k are  either  newly  rebuilt  or  just  reparsed 
by  the  parser  so  they  either  have  no  parent  yet,  or  their  parent  reference  is  not  relevant  since  it 
is  not  necessarily  based  on  the  original  tree  T. 

Condition  (m4)  checks  that  node  MJ}  which  is  about  to  become  the  leftmost  son  of  M was 
previously  the  leftmost  son  of  M.  (The  node  on  the  parse  stack  beneath  M1  must  have  a different 
parent  than  M.)  If  U ^ previously  was  a son  of  but  not  its  leftmost  son,  then  we  will  not  be 
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Mandrioii 


Description 


Variables 

B 

B 

Node  at  bottom  of  parse  stack  of 

M 

M 

Node  of  T 

T 

T 

Threaded  parse  tree 

active  node 

firstly 

First  node  in  y 

irmark 

mark 

Incremental  reparse  marker 

oldtable 

old-table 

Temporary  table  value 

stacktop 

top 

Top  node  of  the  stack 

deletecount 

Number  of  tokens  to  be  deleted 

nextusernode 

Next  node  in  z ’ to  be  read 

nextnode 

Next  node  in  y to  be  read 

Attributes  of  nodes 

addr(M) 

■RSHi 

Address  of  U 

Ithread(M) 

Next  in  pushdown  list  after  M 

parent(M) 

Father  of  M 

pstatc(M) 

t(M) 

LR  table  of  M (parse  state) 

rdescend(M) 

rd{M) 

Rightmost  descendant  of  M 

rmost(M) 

rb(M) 

Rightmost  brother  of  U 

token(M) 

y(M) 

Element  of  V in  M 

lefts  on(M) 

Leftmost  son  of  non-terminal  M 

next(M) 

Next  terminal  in  tree  after  M 

prev(M) 

Previous  terminal  before  M 

sibling  (M) 

Next  sibling  to  right  of  M 

Functions 

alloc  () 

takeQ 

Allocate  a new  node 

apply_match() 

apply-matchingQ 

Graft  new  tree  into  old 

f() 

/ 

Current  parsing  action 

t() 

9 

Current  goto  function 

matchcondQ 

matching-conditionQ 

Can  new  tree  be  grafted 

into  old  tree  at  current  spot? 

reducefi , M) 

apply-rcduction(i,  M) 

Reduce  by  production  i,  making 

M the  parent  node. 

shift(M) 

apply-shift(M) 

Shift  M onto  stack 

shiftfM,  i) 

apply-shtft(M) 

Shift  M onto  stack,  go  to  parse 

state  t (replaces  shift  above) 

stackfM,  j) 

p;(-*0 

Ax)  = M 

p{M)  = Ithrcad(M) 

?3(X)  = P V_1(>0) 

actionf...) 

Combined  / and  g functions 

chain(M,  .„) 

Link  M into  next,  prev  list 

nextsymf) 

Next  node  to  be  read  by  parser. 

unchain(U) 

Unlink  M from  next,  prev  list 

T 


i 

i 


Table  5-1:  Notation  used  in  the  description 
of  the  LR(0)  incremental  reparsing  algorithm. 
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able  to  terminate  the  parse  at  M since  it  would  leave  the  original  leftson  of  M as  a dangling  node 
within  the  tree,  which  would  no  longer  be  well-structured. 

Lastly,  condition  (m5)  checks  that  the  rightmost  descendant  of  Mn  matches  the  rightmost 
descendant  of  the  parent  node  M,  to  guarantee  that  the  newly  created  sub-tree  which  is  re-using 
pre-existing  parent  node  M has  the  same  right  edge  as  the  sub-tree  rooted  in  M.  If  these  nodes 
do  not  match,  then  we  cannot  terminate  the  parse  at  >/,  since  some  nodes  will  be  left  dangling 
where  the  right  edge  of  the  new-subtree  meets  the  old  tree,  and  the  tree  will  not  be  well- 
structured. 

If  conditions  (ml)  through  (m5)  are  all  true,  then  M is  re-used  with  new  children  ...  >/R. 
This  newly  created  sub-tree  is  unified  with  that  part  of  the  original  tree  T which  remains  to  pro- 
duce T’,  and  the  parser  terminates,  accepting  the  new  input. 

5.2.  Algorithm  5.1:  The  Ghezzi  and  Mandrioli  Incremental  LR(0)  Parser 

Having  described  the  algorithm,  we  now  include  the  actual  algorithm  in  this  section.  This 
is  Ghezzi  and  Mandrioli’s  LR(0)  algorithm  as  published,  but  described  using  different  terms. 
Table  5—1  gives  the  correspondence  between  the  terms  used  here  and  those  in  the  originally  pub- 
lished paper.  The  different  notation  is  used  in  part  to  provide  longer,  more  mnemonic  names  for 
the  attributes  of  the  parse  tree  nodes,  and  to  permit  the  algorithm  to  be  described  in  terms  that 
match  the  code  used  in  the  actual  implementation.  Curly  braces  are  introduced  for  grouping,  to 
make  the  algorithm  more  readable. 

5.2.1.  Routines  used  in  the  Parser 

allocf): 

M 4—  allocf); 
addr(M)  «-  JY; 
rdescend(M)  «—  N; 
ret  urn(M). 
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applyjmatch: 

Let  A — ► a be  the  reduction  for  which  the  matching  condition  holds. 

parent(stack(stacktop,  j))  «—  parent(irmark),  VO  < j < I ci  I ; 
rmost(stack(stacktopf  j)  < — stacktop,  V0  < j < |oT. 

Let  A — ► a be  the  reduction  to  be  applied. 

if  irmark  = stack(stacktop,  j),  for  some  0 < j < |o  | 

and  parent(irmark)  = parent(stack(irmark,  h))  V0  <h<\a\-j 

and  parent(irmark)  # parent(stack(irmark,  lei]-  j)) 

and  token(parent(irmark ))  = A 

and  rde8cend(stacktop)  = rdescend(parent(irmark )) 

then 

matchcond  ■*—  true ; 

else 

matchcond  «—  false. 

reduce  (I,  M): 

Let  » be  production  A — *■  ci. 

parent(stack(stacktop,  j))  «—  addr(M),  VO  < j < I o I ; 
rmost(stack(stacktop , j))  •*—  stacktop,  VO  < j<  I a | ; 
token(Jf)  «—  A ; 

let  g be  the  goto  function  of  pstate(stack(stacktop,  |ci  | )); 

pstate(M)  4—  yfAj; 

rdescend(M)  *—  rdescend(stacktop)-, 

stacktoo  4 — addr(M). 

shift  (N): 

Let  g be  the  goto  function  of  pstate(stacktop). 

Ithread(N)  ■*—  stacktop ; 
pstate(M)  «—  g(token(M )); 
stacktop  *—  addr(M). 


5.2.2.  The  Parser 


Let  Tbe  the  parse  tree  for  the  string  w = xzy. 

Let  z’  be  a replacement  string  for  z,  and  w’  = xz’y  the  result. 

1.  Initialisation 

(1.1)  if  w 7^  e (the  empty  string),  then  { 

let  M be  the  node  in  T which  stores  the  first  symbol  of  zy\ 

irmark  ■«—  Ithread(M); 
stacktop  4—  Uhreadf}/)] 
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(1.2)  if  w = e (i.e.,  w’  is  being  parsed  from  scratch)  then  { 
irmark  B; 
stacktop  «—  S; 

}• 

2.  Analysis  of  z’ 

(2.1)  Let  /be  the  parsing  action  of  pstate(stacktop). 

Execute  (a),  (b),  (c),  or  (d)  according  to  the  value  of  /. 

(a)  f=  SHIFT. 

if  the  symbol  to  be  shifted  is  firstf y)  then 
jump  to  (3); 

else  { 

H — alloc()i 

namc(M)  «—  next-symbol-from-the-input, 

shifiM)-, 

. }• 

(b)  / = REDUCE  Let  i be  the  production  A —*■  a. 

if  irmark  = stackfM,  j)  for  some  0 <=  j < I a I (i.e.,  irmark  must  be  updated) 
then 

irmark  ■«—  stack(M,  I o | ). 

U <—  allocf); 
reducefi,  M). 

(c)  f=  ERROR. 

Jump  to  the  appropriate  error  recovery  action. 

(d)  {=  ACCEPT. 

The  algorithm  terminates,  accepting  the  string  so  far  scanned. 

3.  Analysis  of  y 

(3.1)  Let  Hbe  the  node  which  stores  the  first  symbol  of  y. 

oldtable  «—  pstate(M); 
shift  ()1)] 

(3.2)  if  oldtable  / patate(atacktop)  then 

jump  to  3.3; 

Otherwise,  skip  steps  of  the  analysis  of  y as  follows: 

stacktop  4—  rmoat(stacktop)  (we  enter  directly  in  a reduction  state). 

Let  / be  the  parsing  action  of  pstate(stacktop),  where  / = REDUCE  i,  i being 
production  A —*  ot. 

if  matchcond  holds  then  { 

apply_jnatch;  • 

accept  w’,  terminating  the  algorithm. 

} - - 
if  irmark  = stackfstacktop,  j)  for  some  0 <=  J < I a I then 

irmark  «—  stackfstacktop,  lev  |);  g 
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oldtable  «—  patate(parentfstacktop))-, 

if  parent(stack(atacktop,  j))  = parent(stack(stacktop,  k))  V 0 <=  j,  k < | a I then 

{ 

the  entire  subtree  of  T rooted  in  parent(stacktop)  is  reused: 

M «—  parent(stacktop); 

Ithread(U)  «—  stackfstacktop,  I a 1/ 

Let  g be  the  goto  function  of  p state  (stackfatacktop,  I ft  \)). 
pstate(M)  +—  g(A). 

} else  { 

a new  node  is  allocated: 

M *—  allocf); 
reducefi,  M); 

} 

Jump  to  3.2. 

(3.3)  Let  / be  the  parsing  action  of  pstate(stacktop). 

Execute  (a),  (b),  (c),  or  (d)  according  to  the  value  of  /. 

(a)  / = SHIFT.  Let  H be  the  node  corresponding  to  the  next  symbol  of  y. 
oldtable  «—  pstate(M)] 

shift  (II)-, 

jump  to  3.2. 

(b)  / — REDUCE  ».  Let  i be  production  A — ► a; 
if  matchcond  holds  then  { 

apply _match] 

acceDt  w\  terminatine  the  algorithm. 

.}  . . i . 

if  irmark  = stackfstacktop,  j)  for  some  0 <=  j < I a I then 

irmark  a—  stackfstacktop,  I a \)\ 

M alloc ( )] 

reducefi,  M)} 

jump  to  3.3. 

(c)  f = ERROR. 

Jump  to  the  appropriate  error  recovery  action. 

(d)  / = ACCEPT. 

The  algorithm  terminates. 


5.3.  Modifications  to  the  Parse  Tree  Node 

In  the  previous  chapter,  we  decided  that  parse  tree  access  for  other  software  tools  would  be 
improved  by  modifying  some  of  the  attributes  of  a parse  tree  node.  In  addition,  parse  tree  access 
for  the  editor  is  improved  if  some  additional  attributes  are  also  added  to  each  parse  tree  node.  In 
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this  section  we  discuss  these  alterations. 

5.3.1.  Addition  of  a leftson  Attribute 

A leftson  attribute  has  been  added  to  each  non-terminal  parse  tree  node,  so  that  each  non- 
terminal node  now  contains  a pointer  to  its  leftmost  child.  The  leftson  attribute  is  not  required 
by  the  incremental  parser,  so  the  addition  of  one  has  no  effect  on  the  operation  of  the  algorithm 
since  this  attribute  is  never  referenced  by  it.  The  existence  of  this  attribute  requires  only  one  ad- 
ditional assignment  in  the  reduceQ  routine  of  the  algorithm,  to  set  the  Ithread  attribute  of  the 
parent  node  to  its  leftmost  child  at  the  time  that  the  reduction  is  being  performed.  This  is  ac- 
complished by  including  the  following  statement  in  this  routine  just  after  the  token  attribute  is 
set: 

Ithread(M)  •*—  stackfstacktop,  la  I - 1); 

This  assignment  is  the  only  one  necessary  since  the  only  time  the  relationship  between  a parent 
and  its  children  changes  is  during  a reduction.  All  reductions  occur  in  the  reduce  routine,  with 
one  exception,  in  apply^jnutch,  when  the  parent  node  is  re-used  in  performing  the  final  reduction 
which  terminates  the  reparse.  In  this  situation,  for  the  match  condition  to  test  true , the  irmark 
variable  must  point  to  one  of  the  nodes  within  this  production.  This  can  only  occur  if  the  node 
existed  in  the  original  tree,  so  the  value  of  the  leftson  attribute  will  already  be  correctly  set,  and 
no  further  action  is  necessary. 

5.3.2.  Replacement  of  rmost  by  the  sibling  Attribute 

The  rmost  attribute  has  been  replaced  by  the  sibling  attribute.  The  rmost  attribute  con- 
tained a pointer  to  the  rightmost  brother  of  a node  in  a production;  the  new  sibling  attribute  con- 
tains a pointer  to  the  sibling  to  the  immediate  right  of  each  node,  or  nil  if  the  node  is  itself  a 
rightmost  sibling.  The  replacement  of  the  rmost  attribute  by  the  sibling  attribute  does  slightly 
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affect  the  parsing  algorithm  in  that  wherever  the  rmost  attribute  had  been  referenced,  it  is  now 
necessary  to  traverse  the  sibling  links  to  reach  the  rightmost  brother  of  that  node.  This  occurs  in 
only  one  location,  in  section  (3.2)  of  the  algorithm,  in  the  reparse  optimization  section.  The 
reference  to  rmost  is  made  in  order  to  obtain  the  rightmost  sibling  of  a production  which  the 
parser  has  decided  need  not  be  reparsed  since  it  would  have  an  identical  outcome.  The  produc- 
tion is  entered  onto  the  parse  stack  simply  by  setting  stacktop  to  the  value  of  this  attribute.  The 
same  result  is  obtained  using  the  sibling  attribute,  if  the  original  statement  in  section  (3.2): 

stacktop  ■*—  rmost  (stacktop ) 

is  replaced  by: 

while  sibling  (stacktop)  / nil  do 

stacktop  *—  sibling  (stacktop)-, 

While  it  may  appear  that  the  replacement  of  a simple  assignment  statement  by  a loop  may  slow 
the  algorithm,  our  analysis  at  the  end  of  chapter  4 showed  that  the  average  production  rule 
length  R in  sample  parse  trees  tends  to  be  approximately  2 or  less,  depending  on  the  grammar,  so 
that  the  additional  work  required  is  indeed  small. 

To  maintain  the  rmost  attribute,  assignment  statements  were  required  in  both 
apply_jnatch()  and  reducef),  where  the  rmost  attribute  previously  was  set.  The  assignment  state- 
ment: 

rmostfstackfstacktop,  j))  «—  stacktop,  V0  < j < la  I 
is  replaced  in  each  instance  by: 

sibling(stack(stacktop,  j))  «—  stack(stacktop,  j - 1),  V0  < j < \ a I ; 
sibling(stacktop)  «—  nil. 

The  same  number  of  assignments  to  the  sibling  attribute  of  these  nodes  is  required  as  before;  only 
the  value  of  the  assignment  is  different,  since  each  node  now  receives  the  address  of  the  sibling  to 
its  immediate  right,  instead  of  the  rightmost  sibling.  It  should  be  clear  that  by  following  this  list 
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of  pointers,  stacktop  will  always 


be  set  to  the  rightmost  sibling  at  the  conclusion  of  this  loop. 


5.3.3.  Chaining  the  Terminal  Nodes  Together 

Two  other  attributes  have  been  added  to  the  parse  tree  node:  the  next  and  prev  attributes. 
These  are  used  to  chain  the  terminal  leaves  of  the  parse  tree  together  into  a doubly-linked  list. 
While  these  attributes  are  not  necessary  for  the  parsing  algorithm,  they  are  of  great  utility  to  the 
editor  in  producing  the  display  and  executing  user  commands.  Since  the  parse  tree  is  constructed 
and  maintained  by  the  incremental  parsing  algorithm,  the  maintenance  of  these  additional  attri- 
butes is  best  done  by  the  algorithm.  We  add  two  new  routines,  chain  and  unchain,  which  add 
and  remove  nodes  from  this  doubly— linked  list: 

chain(M,  at,  M): 

Let  At  be  a node  in  the  frontier  of  T,  A/  a node  to  be  added, 
and  at  be  one  of  BEFORE  or  AFTER. 

if  at  = BEFORE  then  { 
next(U)  = addr(M)’, 
prev(M)  — prev(M)‘, 
if  prev(M ) # nil  then 

nextfprev(M))  — addr(M); 
prev(M)  = addr(M)] 

} 

if  at  = AFTER  then  { 
next(yi)  = next(M); 
prev(M)  = addr(M)’, 
if  next(M ) 7^  nil  then 

prev(next(M))  = addr(M); 
next(M)  = addr(M); 

}• 

unchain(M): 

pxev(next(U))  = prev(M); 

nextfprev(M))  = next(M). 

The  chain  routine  only  needs  to  be  called  from  one  location  in  the  parser,  at  the  point  that  a 
node  corresponding  to  a token  in  z’  is  shifted  by  the  parser.  Since  the  nodes  corresponding  to  the 
tokens  in  both  x and  y previously  existed  in  T,  their  nezt  and  prev  attributes  are  already  correctly 
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set.  The  unchain  command  only  needs  to  be  called  during  parser  re-initialization  to  remove  each 
of  the  terminal  nodes  associated  with  the  tokens  of  z which  are  being  deleted.  Other  calls  will  be 
needed  for  error  handling,  but  these  calls  will  be  discussed  later,  when  this  topic  is  presented. 

5.4*  Extension  of  the  Algorithm  to  LR(1) 

Since  programming  languages  of  interest  are  easily  expressible  using  LR(1)  grammars,  the 
algorithm  is  extended  to  include  this  class  of  context-free  grammars.  Almost  all  of  the  added 
complexity  caused  by  this  extension  affects  the  parser-generator  program  itself,  since  different  al- 
gorithms need  to  be  used  to  produce  the  parse  tables.  But  once  produced,  the  difference  for  the 
incremental  parsing  algorithm  is  that  the  parsing  action  becomes  a product  of  both  the  state  of 
the  parser  and  the  next  symbol  from  the  input.  Since  these  parse  table  generation  algorithms  are 
well  understood,  and  parser-generators  for  this  class  of  grammars  have  been  written  and  are 
commonly  available,  they  will  not  be  covered  here. 

To  implement  this  extension  in  the  parsing  algorithm,  We  extend  both  / and  g so  that  they 
depend  upon  both  of  these  parameters.  We  introduce  a function,  nextsym , which  returns  the 
node  corresponding  to  the  next  input  token  and  advances  nextusernode . If  there  are  no  new  input 
nodes  left,  then  the  node  in  y to  which  nextnode  points  is  returned,  and  nextnode  is  advanced.  If 
nextnode  is  nil , then  a node  corresponding  to  the  end-of-file  token  is  generated  and  returned. 
The  only  time  this  occurs  is  during  the  very  first  parse  of  a new  file;  in  all  other  invocations  of 
the  parser,  the  last  node  in  the  list  headed  by  nextnodc  will  be  the  end-of-file  token,  which  the 
parser  will  never  go  past.  If  the  end-of-file  token  is  legal,  then  the  parser  will  receive  an  accept 
action  before  a new  node  is  needed;  if  not,  then  an  unexpected  end-of-file  error  will  occur,  and 
the  parser  will  suspend  at  this  point.  Thus,  nextsym  is  defined  as: 
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nextsyrnf): 

Let  U be  a pointer  to  a parse  tree  node. 

if  nextusernode  ^ m/then  { 

U «—  nextusernode ; 

nextusernode  * — nextfnextusernode)] 

) else  if  nextnode  ^ nil  then  { 

U nextnode ; 

nextnode  «—  next  (nextnode)] 

) else  { 

M 4-  a//oc/7; 

token(M)  +—  eof\  (the  end-of-file  token  code) 

} 

refurn/Wj. 


This  extension  requires  a change  to  the  algorithm,  as  follows;  wherever 
Let  / be  the  parsing  action  of  pstate(stacktop) 

appears,  it  is  replaced  by: 

Let  /be  the  parsing  action  of  pstate(stacktop)  and  nextsyrnf). 

Wherever 

Let  g be  the  goto  function  of  pstatef...) 

appears,  it  is  replaced  by: 

Let  g be  the  goto  function  of  pstatef,..)  and  nextsyrnf). 

Modifications  to  the  incremental  parsing  algorithm  occur  in  part  (2.1),  the  end  of  part  (3.2),  part 
(3.3)  and  in  the  routines  shift  and  reduce. 

The  major  change  to  the  parser  occurs  in  the  initialization  section,  since  it  is  no  longer 
sufficient  to  initialize  stacktop  the  stack  pointer  contained  in  activenode.  Because  of  the  looka- 
head requirement,  it  may  now  happen  that  the  parse  action  previously  taken  on  the  node  preced- 
ing activenode  will  differ  from  the  action  that  will  be  taken  during  this  parse,  since  activenode 
will  not  necessarily  be  the  lookahead  token  this  time;  the  first  token  in  z'will  be  instead.  There- 
fore, the  parser  must  back  up  one  token  (since  the  grammar  is  LR(1))  and  then  reset  the  parser 
variables  using  this  node  instead.  Section  (1.1)  must  be  altered  to  reset  M to  prevff!)  before  any 
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of  the  assignments  are  made.  Thus,  section  (1.1)  becomes: 

(1.1)  if  u;  7^  e (the  empty  string),  then  { 

let  M be  the  node  in  T which  stores  the  first  symbol  of  zy; 

M «—  prev(M); 
irmark  < — Ithread(M); 
stacktop  «—  Ithread(M); 


5.5.  Redefinition  of  the  reduce  Operation 

To  permit  the  editor  to  pass  non-terminal  nodes  to  the  parser  in  the  nextuaernode  list 
(corresponding  to  tokens  in  z *),  and  to  permit  the  / and  g functions  to  be  combined  (covered  in 
the  next  section),  the  reduce  action  is  redefined  to  prepend  the  parent  node  onto  the  input  instead 
of  placing  it  on  the  top  of  the  parse  stack.  Routines  nextaym,  reduce  and  shift  need  to  be 
modified,  and  a new  routine,  prepend , added  to  place  the  new  parent  node  at  the  head  of  the  in- 
put list. 

Since  the  parent  node  which  is  to  be  prepended  to  the  input  list  will  immediately  be  shifted 
during  the  next  step  of  the  parser,  it  is  sufficient  to  define  a new  variable  aavenode  to  retain  this 
node,  and  add  an  initial  test  to  the  nextaym  routine  to  return  the  node  assigned  to  aavenode  if  it 
is  non—  nil,  setting  aavenode  to  nil  when  this  is  done.  The  new  routine  prepend  is  defined  as  fol- 
lows: 

prepend(M): 

aavenode  «—  M. 

The  new  test,  placed  at  the  beginning  of  nextaym , is 

if  aavenode  ^ nil  then  { 

M «—  aavenode ; 
aavenode  •«—  nil; 

} else  ... 
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Several  lines  of  code  at  the  end  of  the  reduce  routine,  which  place  M onto  the  parse  stack, 
are  deleted.  These  lines  are: 

let  g be  the  goto  function  of  pstate(stack(stacktop,  lo  I)); 

pstate(N) «—  g(A)', 
stacktop  «—  addr(M). 

This  modification  simplifies  the  parsing  algorithm,  since  redundant  code  for  shifting  nodes 
is  removed  from  reduce. 

6.6.  Combining  the  / and  g Parsing  Functions 

At  each  step  in  an  incremental  reparse,  the  parser  first  determines  the  parsing  action  / and 
then  a short  time  later  the  goto  function  g.  We  have  redefined  the  reduction  action  as  a reduc- 
tion by  a given  production  rule  number,  with  the  resulting  parent  node  prepended  to  the  input 
rather  than  immediately  placed  on  top  of  the  parse  stack.  With  this  redefinition,  g now  depends 
upon  the  current  parse  state  and  the  head  of  the  input  just  as  / has  always  done.  The  goto  func- 
tion in  a reduction,  which  previously  depended  upon  the  parse  state  uncovered  in  the  parse  stack 
has  become  the  goto  function  of  a shift  action  of  a non— terminal  symbol. 

This  uniformity  permits  us  to  combine  the  / and  g functions  into  a single  action  routine. 
This  is  a desirable  alteration,  since  parse  tables  usually  code  both  the  action  and  new  state  or 
rule  number  together  in  a single  entry.  By  retrieving  both  with  a single  call,  we  eliminate  a du- 
plicate lookup  that  would  otherwise  have  to  occur  at  each  step  in  the  parse. 

The  action  routine  takes  as  arguments  the  current  parse  state  and  input  symbol,  as  / and  g 
did  before,  and  returns  two  values.  In  the  case  of  a shift  or  accept  action,  these  new  values 
correspond  to  the  old  values  returned  by  / and  g,  namely  the  action  and  the  new  parse  state.  In 
the  case  of  a reduce  action,  the  second  value  is  assigned  the  production  rule  number  by  which  the 
reduction  is  to  be  made.  In  other  cases,  this  second  value  is  unused. 
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This  change  is  incorporated  into  the  incremental  parsing  algorithm  by  modifying  sections 
(2.1),  (3.2)  and  (3.3).  The  introductory  line: 

Let  /be  the  parsing  action  of  pstate(stacktop)  and  nextsymf ) 
is  replaced  by  an  explicit  call  to  the  action  routine: 

action(pstatc(stacktop),  nextsymf ),  f,  newvalue) 

where  / now  becomes  a variable  assigned  by  action,  and  the  variable  in  the  position  of  newvalue 
above  is  assigned  either  the  new  parse  state  or  a production  rule  number,  according  to  the  value 
of/. 

The  shift  routine  must  also  be  modified  to  pass  in  the  new  parse  state,  now  contained  in 
newvalue,  since  it  is  no  longer  necessary  to  call  g from  within  it.  The  call  to  g is  replaced  by  the 
value  of  this  variable.  No  further  modifications  are  necessary,  and  it  should  be  evident  that  the 
algorithm  still  runs  as  before  since  its  flow  is  still  the  same;  only  the  form  in  which  this  informa- 
tion is  passed  has  changed.  The  significance  of  this  change  is  that  the  efficiency  of  the  algorithm 
is  improved,  and  we  gain  the  ability  to  pass  an  intact  sub-tree  to  the  parser. 

5.7.  Extension  to  Support  Grammars  with  epsilon  Productions 

The  parsing  theory  for  LR(1)  context-free  grammars  is  well  developed,  and  epsilon  produc- 
tions (productions  with  empty  right  hand  sides)  are  well  understood.  While  any  grammar  con- 
taining epsilon  productions  can  be  represented  by  an  equivalent  grammar  with  none  [Hopcroft 
and  Ullman,  69],  it  is  much  more  convenient  for  the  language  implementor  to  be  able  to  use  epsi- 
lon rules  in  his  specification. 

The  addition  of  epsilon  rules  adds  some  complexity  to  the  algorithm.  First,  their  represen- 
tation in  the  tree  must  be  decided.  Two  approaches  are  common:  the  first  places  nil  pointers  into 
the  parents  of  epsilon  productions;  the  second  represents  the  empty  right  hand  side  with  an  epsi- 
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Ion  terminal  node,  and  adds  a token  code  to  the  terminal  vocabulary  of  the  grammar.  The 
second  method  has  been  chosen  for  the  SAGA  editor,  since  it  results  in  a more  uniform  parse 
tree.  All  non-terminal  nodes  always  have  at  least  one  child,  and  when  descending  through  the 
tree  toward  the  frontier,  one  is  guaranteed  to  eventually  reach  a terminal  node. 

The  initialization  of  the  algorithm  is  affected,  since  it  is  no  longer  sufficient  to  use 
prev(activenode)  to  initialize  the  parser.  Any  section  of  the  frontier  can  contain  an  unlimited  se- 
quence of  epsilon  nodes,  depending  on  the  form  of  the  grammar  in  use.  Therefore,  it  is  necessary 
to  check  the  token  type  of  the  preceding  node  and  if  it  is  an  epsilon  token,  to  continue  traversal 
back  along  the  prev  links.  Since  the  stack  is  Unite,  either  a non— epsilon  terminal  token,  or  the 
bottom-of-stack  node  B will  eventually  be  reached.  This  node  then  can  be  used  to  initialize  the 
parser.  In  the  initialization  of  the  parser,  part  (1.1)  is  replaced  by  the  following  code: 

(1.1)  if  w # e,  then  { 

let  M be  the  node  in  T which  stores  the  first  symbol  of  zy; 

M •<—  prevffl); 
while  token(M)  — ( do 
M <—  prev(M); 
irmark  « — Ithread(M); 
stacktop  «—  Ithread(M); 


Part  (2.1)(b)  is  affected,  since  in  the  production  A -+  a,  a can  now  be  of  length  zero.  In 
this  case,  we  shift  an  epsilon  terminal  node  onto  the  stack,  and  then  perform  a reduction,  using  a 
length  of  1 for  the  rule  instead  of  0.  Replace  “M  «—  allocf);  reduce(\,  M)”  in  (2.1)(b)  with: 

(2.1)(b) 

if  1«  I > 0 then  { 

M+-  allocf)-, 
reduce(i,  M); 

} else  { 

M allocf); 
token(M)  «—  e; 
shift) N,  -1); 
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«V  +-  allocf)', 
reducefi,  M)‘, 

}• 

The  $At/f  routine  is  passed  an  unused  parse  state  code,  in  this  case  -J,  for  assignment  to  the 
pstate  field,  since  the  epsilon  token  has  no  goto  function.  The  reduce  routine  must  be  modified  to 
check  whether  I a I = 0,  and  if  so  to  assume  that  I cv  | = 1 instead,  since  an  epsilon  node  now  re- 
sides on  the  stack.  The  matchcond  routine  needs  an  identical  check,  although  the  test  will  always 
fail  when  called  with  the  parent  of  an  epsilon  node,  since  a production  must  have  length  > 1 in 
order  to  pass  all  of  the  tests. 

A modification  to  nextsym  is  also  required,  to  test  for  an  epsilon  token  and  delete  it  from 
the  list.  A while  loop  suffices,  which  will  continue  testing  tokens  until  one  is  found  which  is  not 
an  epsilon  token.  Because  the  editor  produces  an  initial  parse  tree  during  initialization  on  a new 
file,  the  last  token  in  the  nextnode  list  will  always  be  the  end-of-file  token,  so  this  loop  will  al- 
ways succeed  in  locating  a non-epsilon  token. 

In  nextsym,  the  following  code  fragment: 

} else  if  nextnode  ^ nil  then  { 

M nextnode ; 

nextnode  next(nextnode)] 

} else  { 


is  replaced  by: 


} else  if  nextnode  ^ nil  then  { 

while  nodetype(nextnode)  = e do 
nextnode  «—  next(nextnode)\ 
M nextnode ; 
nextnode  «—  next(nextnode); 

} else  { 
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It  should  be  evident  that  any  epsilon  tokens  existing  in  the  original  tree,  after  the  point  of 
the  modification,  which  are  reached  by  the  parser  will  be  removed,  and  that  the  parser  will  re- 
ceive the  correct  lookahead  token.  If  the  epsilon  tokens  should  be  retained  in  the  new  tree,  the 
parser  will  produce  new  nodes  as  necessary,  when  directed  by  the  action  routine  to  perform 
reductions  in  which  the  length  of  the  right  hand  side  of  the  production  rule  is  zero. 

No  other  modifications  are  needed  to  support  epsilon  rules.  We  now  turn  our  attention  to 
comment  tokens  and  their  handling. 

5.8.  Extension  to  Support  Comments 

Providing  support  for  comments  is  one  of  the  more  challenging  tasks  for  language— based 
editors.  Programming  languages  which  include  comments  typically  permit  them  to  appear 
between  any  two  tokens  in  the  input;  some,  such  as  the  C programming  language  [Kernighan  and 
Ritchie,  78],  also  permit  them  even  within  tokens,  between  any  two  characters.  This  flexibility  is 
easy  for  a batch  compiler  to  support,  since  all  comments  are  stripped  out  of  the  input  and  dis- 
carded as  soon  as  they  are  read,  and  do  not  affect  further  processing  of  the  input  data.  But 
language— based  editors  do  not  have  this  option,  since  they  are  expected  to  retain  and  display  a 
user’s  comments  along  with  his  program  text.  Unfortunately,  while  a lexical  class  for  comments 
is  easily  definable,  incorporating  a comment  token  into  the  production  rules  of  a grammar  is  usu- 
ally not  possible;  if  all  of  the  permissible  locations  for  comments  in  the  language  are  specified  in 
the  grammar,  it  becomes  ambiguous  and  cannot  be  successfully  processed  by  a parser-generating 
system.  An  alternate  method  of  handling  comments  needs  to  be  used. 

Some  syntax-directed  editors  solve  this  problem  by  restricting  the  locations  at  which  com- 
ments can  appear  so  that  comment  tokens  can  be  specified  in  the  formal  description  of  the 
language  [Teitelbaum  and  Reps,  81].  Comments  are  required  in  certain  locations,  such  as  preced- 
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ing  procedure  declarations,  permitted  in  others,  and  prohibited  in  most  remaining  ones.  Restric- 
tions such  as  these  are  often  justified  by  the  implementors  by  stating  that  they  are  producing  a 
structured  environment,  that  the  use  of  comments  is  to  be  encouraged,  and  that  by  limiting  them 
to  a few  key  locations,  they  are  encouraging  a more  standardized  development  style. 

Other  editors  [Horton,  81]  construct  comment  tokens  and  attach  them  to  a nearby  terminal 
token  in  the  parse  tree.  This  has  the  advantage  of  hiding  the  comment  from  the  parser,  but  the 
disadvantage  of  forcing  the  comment  to  be  treated  as  an  attribute  of  a neighboring  node,  when 
no  such  relationship  necessarily  exists.  There  is  also  the  added  problem  of  deciding  whether  to 
attach  the  comment  to  the  preceding  or  following  parse  tree  node.  This  is  often  solved  either  by 
picking  one  by  default  and  letting  the  user  override  the  choice,  or  prompting  the  user  for  the 
node  to  use  each  time  he  enters  a comment.  This  choice  is  not  simple:  a comment  documenting 
a routine  is  usually  placed  before  the  routine  in  the  file,  immediately  preceding  the  first  token  in 
the  routine,  while  a comment  documenting  a variable  declaration  is  usually  placed  after  the  de- 
claration (and  any  trailing  punctuation  that  may  be  present).  Trying  to  determine  the  node  to 
which  the  comment  should  be  attached  based  on  the  surrounding  context  can  be  attempted  if 
language-dependent  information  is  used,  but  suffers  the  difficulty  of  not  knowing  where  to  place 
the  comment  when  a syntax  error  occurs. 

The  SAGA  editor  uses  a third,  and  new,  approach.  Comments  are  tokenized  by  the  lexical 
analyzer  and  allocated  their  own  terminal  node,  one  per  comment.  These  nodes  are  attached  to 
the  parse  tree  along  the  prev/next  doubly-linked  list  of  terminal  nodes  in  the  parser.  Each  time 
a comment  token  is  detected  in  the  input,  it  is  linked  into  this  list,  and  the  following  token  is  re- 
trieved from  the  input  to  be  passed  to  the  action  routine,  so  that  it  never  encounters  a comment 
token,  and  the  parse  tables  do  not  need  entries  for  comments.  Since  the  prev/ncxt  list  is  not  used 
by  the  algorithm,  once  the  comment  tokens  are  in  the  tree,  they  are  never  seen  again  by  the 
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parser.  Even  walking  the  tree  in  the  traditional  way  from  the  root  will  not  discover  any  com- 
ment tokens  in  the  tree,  so  that  the  routines  which  only  are  concerned  only  with  syntactic  struc- 
ture need  not  be  modified  to  process  comments,  even  though  comments  can  in  fact  occur  between 
any  two  tokens  in  the  parse  tree.  In  subsequent  editing,  the  comment  will  be  included  in  the 
operation  being  performed  if  it  is  selected  by  the  user,  and  not  otherwise.  Routines  which  need 
to  process  comments  while  walking  the  tree  can  do  so  by  checking  the  next  attribute  of  each  ter- 
minal node  they  encounter,  and  testing  the  token  attribute  of  this  node. 

Many  programming  languages  permit  single  comments  to  span  more  than  one  line.  While 
the  comment  text  could  be  stored  in  one  long  string  and  a single  node  allocated  for  the  entire 
comment,  this  representation  is  not  convenient  for  the  routines  which  must  track  the  position  of 
the  editing  cursor  as  it  moves  past  such  comments.  Therefore,  multi-line  comments  are 
represented  by  a comment  tree  of  unit  height,  in  which  the  text  of  each  separate  line  of  the 
multi— line  comment  is  stored  separately,  and  allocated  its  own  terminal  node.  A single  non- 
terminal node  is  allocated  to  be  the  parent  of  all  of  these  terminal  nodes.  This  parent  token  is 
linked  into  the  prev/next  terminal  list  in  the  parse  tree,  so  that  the  comment  is  represented  by  a 
single  token.  At  the  same  time,  by  accessing  the  children  of  this  node,  information  about  the  for- 
matting of  the  comment  across  lines  can  be  obtained  without  needing  to  actually  read  the  text 
string  itself,  making  the  calculations  for  editing  cursor  positioning  more  efficient. 

The  major  change  to  the  parsing  algorithm  is  to  the  lexical  analyzer,  which  must  recognize 
a multi-line  comment  and  construct  the  tree  described  above.  Section  (1.1)  of  the  initialization 
must  also  be  modified  to  back  along  the  frontier  past  comment  tokens  as  well  as  epsilon  tokens. 
The  while  loop  becomes: 

while  token(M)  = e or  token(M)  = commentcode  do 
M prev(M)) 
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An  additional  parse  action,  noaction , is  added  to  the  possible  parse  actions  which  can  be  taken  by 
the  parser.  When  a comment  token  is  detected,  the  parser  takes  this  parsing  action  instead  of 
passing  the  parse  state  and  comment  to  the  action  routine  for  lookup  in  the  parse  tables.  If  the 
parser  is  parsing  z\  this  action  causes  the  node  representing  the  token  to  be  chained  into  the  ter- 
minal list  of  the  parse  tree;  if  y is  being  reparsed,  then  no  action  is  required,  and  the  parser  sim- 
ply moves  on  to  the  next  token. 

Specifically,  both  sections  (2.1)  and  (3.3)  of  the  parsing  algorithm  need  to  be  modified  to 
add  a fifth  parse  action  (e),  in  which  / is  noaction.  In  section  (2.1)  only,  a call  is  made  to  the 
chain  routine  to  insert  the  comment  node  into  the  terminal  list. 

6.9.  Extension  to  Support  Exception  Handling  for  Errors 

Error  handling  is  a difficult  issue,  and  one  which  significantly  complicates  the  parsing  algo- 
rithm. Many  syntax-directed  editors  avoid  the  issue  by  limiting  the  user  to  operations  which 
permit  only  a correct  program  to  be  produced.  But  these  limitations  are  overly  restrictive,  mak- 
ing many  simple  modifications  tedious.  By  permitting  a user  to  make  changes  which  take  a pro- 
gram through  intermediate  incorrect  states,  much  more  flexible  editing  becomes  possible.  The 
SAGA  editor  has  followed  this  approach. 

The  first  question  which  arises  in  error  handling  is  whether  to  provide  error  correction,  or 
error  recovery . Error  correction  can  simplify  the  implementation,  since  a trap-door  error 
recovery  mechanism  can  be  used  to  restore  a correct  environment  and  permit  the  parser  to  con- 
tinue to  completion.  However,  the  correction  method  used  often  restructures  the  input  into  a 
different  form  than  the  user  intended,  and  can  create  more  work  for  the  user  to  restore  his 
correct  input  from  the  system-corrected  result  than  if  he  simply  fixed  the  original  error.  Error 
recovery  does  not  repair  the  error  automatically,  but  permits  the  editor  to  continue  in  operation 
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Figure  5-1:  Parse  tree  produced  by  the  insertion  of  “type  color  = (blue,  green,  red);”.  The  rec- 
tangle gives  the  display  on  the  user’s  terminal.  In  addition  to  the  links  shown,  each  node  is  also 
linked  to  its  rightmost  descendant,  and  each  terminal  node  is  contained  in  a doubly-linked  list 
connecting  it  to  the  immediately  preceding  and  following  terminal  nodes  along  the  frontier  of 
the  parse  tree.  To  avoid  clutter,  these  links  have  been  omitted  since  they  can  be  determined  by 
inspection  of  the  parse  tree. 


until  some  later  time  when  the  user  can  repair  the  error  himself.  To  recover  from  an  error,  the 
implementation  must  be  able  to  save  the  state  of  the  parse  and  local  tree  structure  for  later  con- 
tinuation, suspend  the  parse,  and  return  to  the  editor. 
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Figure  5-2:  Parse  tree  from  Figure  5-?a  after  the  incorrect  insertion  of  “intensity  : integer;”. 
The  text  which  was  not  successfully  parsed  is  highlighted  on  the  screen.  A marker  token  has 
been  inserted  into  the  tree  to  note  the  point  of  the  error. 


5.9.1.  Single  Exception  Handling 

The  SAGA  parser  divides  exceptions  into  two  types:  errors  and  suspensions.  An  error  oc- 
curs when  a syntax  error  action  is  returned  to  the  parse  tree  constructor  by  the  action  routine. 
A suspension  occurs  either  when  a user  requests  a partial  parse  and  the  parser  finishes  parsing  z\ 
or  when  the  parser  attempts  to  perform  a reduction  and  detects  an  insufficient  context  on  the 
parse  stack,  caused  by  a previous  error  or  suspension. 


Figure  5-3:  Parse  tree  from  Figure  5-1  after  the  incorrect  insertion  of  hue”  before  the  equal 
sign.  In  this  case,  since  the  following  terminal  node  had  its  Ithread  attribute  set  to  a terminal 
node,  this  attribute  has  been  reset  to  point  to  the  marker  node. 


To  process  errors,  a third  type  of  parse  tree  node  is  introduced:  the  marker  node.  In  addi- 
tion, a new  attribute  nodetype  is  added  to  all  nodes;  this  attribute  will  be  set  to  TERM,  NT,  or 
MARKER,  according  to  whether  the  node  is  a terminal,  non-terminal,  or  marker  node. 

When  an  error  occurs,  the  parser  allocates  a marker  node,  takes  the  offending  terminal  node 
and  all  of  the  remaining  nodes  in  the  nextusernode  list  (which  correspond  to  the  tokens  in  z’  not 
yet  parsed),  and  makes  them  children  of  the  marker.  The  terminal  nodes  are  linked  into  the 
prev/next  list.  If  the  Ithread  attribute  of  the  node  following  the  rightmost  child  of  the  marker 
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node  points  to  a terminal  node,  it  is  reset  to  point  to  the  marker  node,  as  are  the  (thread  attri- 
butes of  any  of  its  parents  which  also  point  to  the  same  node.  This  relinking  guarantees  that  the 
parser  will  detect  the  marker  if  it  later  is  reparsing  a section  following  this  one,  and  a reduction 
brings  it  into  this  area  of  the  tree.  The  current  value  of  stacktop  is  saved  in  the  marker  node,  for 
later  restoration  of  the  parse  stack  if  a parse  is  resumed  at  this  point  in  the  tree.  An  example  of 
the  handling  of  a syntax  error  is  illustrated  in  Figures  5-1,  5-2  and  5-3. 

When  a suspension  is  indicated,  the  parser  allocates  a marker  node  and  links  it  directly  into 
the  prev/next  terminal  list  just  before  the  node  that  would  be  returned  by  the  next  call  to 
nextsym . The  node  following  the  marker  has  its  Ithread  attribute  reset  to  point  to  the  marker 
node,  as  do  any  of  its  parents  whose  Ithread  attribute  is  identical.  The  current  value  of  stacktop 
is  stored  in  the  marker  node,  and  the  parser  returns  to  the  editor. 

By  tokenizing  all  new  input  before  invoking  the  parser  and  linking  the  nodes  into  the  termi- 
nal list  when  an  error /suspension  occurs,  the  editor  can  display  the  unparsed  nodes  even  though 
the  parser  has  not  yet  completely  incorporated  them  into  the  internal  structure  of  the  parse  tree. 
This  ability  is  important,  since  it  permits  the  user  to  view  his  input  at  the  points  of  discontinui- 
ty, and  even  perform  further  modifications  before,  at,  or  after  these  points.  Since  the  marker 
node  is  an  integral  part  of  the  parse  tree,  trees  containing  errors  can  be  saved  between  editor  ses- 
sions and  repaired  at  a later  time. 

A number  of  modifications  to  the  parsing  algorithm  are  necessary  to  support  exception  han- 
dling. First,  a new  routine  exception  is  introduced,  to  mark  the  point  of  discontinuity  in  the 
parse  tree: 

exception(kind): 

Let  kind  be  either  ERROR  or  NOACTION , according  to  whether  a syntax  error 
or  a suspension  has  occurred. 

Let  M be  a marker  node,  to  note  the  point  of  error. 

Let  M be  the  incorrect  parse  node,  or  nil  if  a suspension  has  occurred. 
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X = allocf); 

if  kind  = ERROR  then  { 
leftson(M)  = M; 
parent(R)  = addr(M); 
if  the  parser  is  in  section  (2)  then  { 

chain(R,  BEFORE , activenode); 

chain(M.,  BEFORE,  activenode),  'il  < j < n,  where  the  M.  are  on  the 
nextusernode  list; 

parent(M-)  = addr(M)’,  VI  < j < n. 

} (otherwise  we  are  reparsing  y,  and  M is  already  in  the  terminal  list) 

} else  { 

leftson(M)  = nil; 

chainfH,  BEFORE,  activenode ); 

} 

Ithread(M)  = stacktop; 

if  nodetype(lthread(activenode))  — TERM  then 

IthreadfM J = addr(M),  V>/#,  where  UhreadfRJ  = Ithread(activenode). 


Since  the  parser  can  now  terminate  in  one  of  two  ways,  either  by  a completion  or  a suspen- 
sion, a fourth  section  is  added  to  the  algorithm  to  handle  termination.  If  the  parser  completes, 
accepting  the  modification  just  made,  then  the  algorithm  jumps  to  (4.1).  Section  (3.3d)  is 
changed  from  “the  algorithm  terminates”  to  “jump  to  (4.1)”.  If  the  parser  suspends,  then  it  will 
jump  to  (4.2)  to  terminate.  To  handle  suspensions,  sections  (2.1c)  and  (3.3c)  of  the  incremental 
parsing  algorithm  must  be  altered  from  “jump  to  the  appropriate  error  recovery  action”  to 
“ exceptionfERROR );  jump  to  (4.2)”.  In  addition,  a test  is  inserted  at  the  beginning  of  section  (3) 
to  determine  whether  a partial  parse  has  been  requested  by  the  user,  and  if  so,  the  code 
“ exceptionfNO ACTION);  jump  to  (4.2)”  is  executed. 

These  modifications  are  sufficient  to  recover  from  an  initial  exception  which  the  parser 
might  encounter.  If  subsequent  parsing  is  now  restricted  to  requiring  the  repair  of  this  error  be- 
fore permitting  any  other  editing,  then  no  further  alterations  are  necessary,  and  the  extensions  to 
the  parser  are  finished.  But  a practical  editor  should  be  more  flexible  than  this,  and  so  we  will 
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investigate  parsing  in  the  midst  of  multiple  errors. 

5.9.2.  Multiple  Exception  Handling 

By  making  two  other  extensions,  to  permit  the  parser  to  encounter  marker  nodes  in  its  in- 
put, and  to  detect  marker  tokens  in  the  parse  stack  when  a reduction  is  about  to  be  performed, 
we  can  relax  the  single  error  restriction,  and  permit  editing  anywhere  within  the  tree  no  matter 
how  many  errors  or  suspensions  are  outstanding. 

A parse  tree  containing  a single  error  or  suspension  point  will  have  either  a marker  node  in 
the  terminal  list,  or  a continuous  sequence  of  one  or  more  unparsed  nodes,  all  with  their  parent 
attribute  set  to  the  marker  node  which  manages  the  discontinuity.  If  a parse  can  occur  elsewhere 
in  a tree  containing  one  of  these  discontinuities,  then  the  marker  or  an  unparsed  node  can  be  en- 
countered in  one  of  three  ways:  (1)  during  reinitialization,  (2)  if  the  nextnode  variable  becomes 
set  to  one  of  these  nodes,  and  nextsym  is  called  to  return  the  next  node  as  the  parser  moves  for- 
ward, or  (3)  if  a marker  node  is  found  on  the  parse  stack  during  a reduction  operation.  If  each  of 
these  cases  is  addressed,  then  parsing  can  be  permitted  anywhere  along  the  frontier  of  the  tree  no 
matter  how  many  points  of  discontinuity  exist  in  the  parse  tree. 

During  reinitialization,  the  parser  backs  along  the  frontier  immediately  before  activenode, 
to  find  the  most  recent  token  (excluding  epsilon  tokens  and  comments)  that  previously  had  been 
shifted  by  the  parser.  If  during  this  operation  an  unparsed  or  marker  node  is  encountered,  then 
the  initialization  cannot  be  completed,  since  there  is  no  previous  parse  context  to  retrieve.  The 
user’s  modification  can  still  be  permitted,  however,  by  deleting  the  number  of  nodes  specified, 
and  then  calling  exception(NOACTION)  to  link  the  new  input  from  the  nextusernode  list  into  the 
frontier  together  with  the  marker  node.  The  new  input  will  be  retained  in  the  frontier  of  the 
tree,  but  its  parsing  will  need  to  be  deferred  until  the  earlier  exception  is  repaired. 
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If  the  parser  succeeds  in  its  reinitialization,  successfully  processes  all  of  the  nodes  in  the 
nextuaernode  list,  and  then  encounters  an  unparsed  or  marker  node  during  a call  to  nextaym,  an 
attempt  can  be  made  to  continue  the  parse.  If  nextnode  points  directly  to  a marker  node,  the 
marker  can  simply  be  deleted  from  the  tree  and  nextnode  advanced.  If  nextnode  points  to  an  un- 
parsed node,  then  the  node  can  be  returned  to  the  parser,  and  nextnode  advanced.  The  parse 
should  be  continued  because  a node  which  previously  caused  an  error  might  parse  correctly  now, 
since  the  parse  context  immediately  before  it  may  be  different  than  before.  Once  the  last  un- 
parsed node  is  passed  to  the  parser,  the  marker  node  effectively  drops  out  of  the  tree.  Only  the 
l thread  attribute  of  the  terminal  node  and  zero  or  more  of  its  parents  following  the  last  unparsed 
node  still  point  to  the  marker  node.  We  must  add  a test  to  exception  to  reset  the  Ithread  attri- 
bute if  the  node  type  is  a marker  as  well  as  a terminal  node,  then  if  a new  suspension  were  to  oc- 
cur at  this  point,  these  fields  would  all  be  reset  to  a new  marker  node,  leaving  no  further  refer- 
ence to  the  original  marker.  If  the  parse  does  continue  beyond  this  point,  the  Ithread  attribute  of 
this  terminal  node  will  be  altered  as  soon  as  it  is  pushed  onto  the  parse  stack,  along  with  those  of 
its  parents  as  soon  as  they  are  processed.  If  the  reparse  progresses  in  such  a way  that  these  non- 
terminal nodes  are  not  reprocessed,  then  they  will  be  excluded  from  the  final  tree  when  the  match 
condition  holds,  and  their  reference  to  the  deleted  marker  node  will  be  irrelevant.  Therefore,  the 
only  modification  required  to  the  algorithm  is  made  to  the  block  of  code  in  nextaym  headed  by  “if 
nextnode  ^ nil  then  { ...  J”,  which  is  changed  to: 

if  nextnode  / nil  then  { 

while  no  detype  (nextnode)  = ( or  nodetype(nextnode)  = MARKER  do 
nextnode  *—  next(nextnode); 

M «—  nextnode ; 

nextnode  ■*—  next(nextnode); 

} 
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Only  one  other  case  remains.  Recall  that  whenever  a marker  node  is  inserted  into  the  tree, 
the  / thread  attribute  of  the  following  terminal  node  and  any  of  its  parents  with  an  identical 
Ithread  attribute  all  have  it  reset  to  point  to  the  marker  node.  Therefore,  any  stack  traces  which 
pass  through  these  nodes  will  pass  through  the  marker  node  and  continue  to  the  left,  always  ter- 
minating in  the  bottom-of-stack  node  B.  Any  other  stack  traces  which  pass  through  a parent 
node  whose  Ithread  attribute  was  not  reset  will  not  encounter  the  marker  and  the  parse  can 
proceed  normally.  So  the  only  additional  check  by  the  parser  occurs  in  the  reduce  routine,  to 
determine  if  any  node  in  the  handle  about  to  be  reduced  is  a marker  node.  If  one  is  detected,  the 
reduction  cannot  be  made,  and  the  parser  must  suspend,  since  there  is  inadequate  context  to  be 
used.  A call  to  exception  is  made  instead,  and  the  parser  inserts  a suspension  point  just  before 
nextnode.  The  parse  must  now  be  suspended,  so  reduce  must  be  further  modified  to  return  a 
value:  0 if  the  reduction  proceeded  normally,  1 otherwise.  The  parser  checks  the  return  code 
from  reduce,  and  if  it  is  non-zero,  immediately  jumps  to  (4.2)  for  termination. 

With  these  three  cases  accounted  for,  the  parser  can  now  support  general  editing 
throughout  the  tree,  regardless  of  the  number  of  outstanding  errors.  Although  in  two  of  these 
cases  the  parser  must  suspend  when  it  encounters  a marker  or  unparsed  node,  the  user’s  input 
will  still  be  entered  into  the  tree  and  displayed,  so  that  flexible  editing  is  supported. 

5.10.  Extension  to  Support  a shiftr  educe  Parse  Action  Optimization 

Some  parser-generators  optimize  their  parse  tables  by  providing  another  parse  action  in 
addition  to  the  basic  four  actions:  shift,  reduce,  error,  and  accept.  This  fifth  action,  shiftreduce, 
is  returned  whenever  it  has  been  determined  that  a shift  action  will  produce  a stack  containing  a 
complete  handle  to  be  reduced,  and  a reduction  can  immediately  be  performed.  By  providing 
this  fifth  action,  both  the  number  of  states  required  for  the  parse  tables,  and  the  number  of  steps 
the  parser  must  take,  can  be  reduced. 
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Back  in  chapter  3,  a sample  parse  was  presented  in  Figure  3-2.  Although  the  example  did 
not  show  any  shiftreduce  actions,  moves  la,  lb,  4a,  4b,  5a,  5b,  8a  and  8b  made  by  that  parser 
could  have  been  combined  into  single  moves  1,  4,  5,  and  8,  the  actions  replaced  in  the  parse 
tables  by  actions  “sr6”,  “sr6”,  “sr5”,  and  “sr6”  respectively,  and  parse  states  9,  10,  and  11  delet- 
ed from  the  parse  tables.  The  reader  should  note  that  the  parse  table  states  eligible  for  this 
treatment  are  the  ones  in  which  the  only  non-error  actions  are  reductions  by  a single  production. 
In  this  case,  all  “s9”  actions  would  be  replaced  by  “sr5”  actions,  all  “slO”  actions  replaced  by 
“sr6”  actions,  and  all  “sll”  actions  replaced  by  “sr8”  actions. 

Adapting  the  parsing  algorithm  for  this  extension  requires  simple  extensions  to  sections 
(2.1)  and  (3.3).  A sixth  case,  labeled  (f)  in  the  final  presentation  of  the  algorithm,  for  / = 
shiftreduce  needs  to  be  added.  The  code  for  this  new  case  is  simply  the  code  for  the  shift  action 
followed  immediately  by  the  code  for  the  reduce  action  which  appears  in  accompanying  sections 
(a)  and  (b). 

5.11.  Algorithm  5.2:  The  SAGA  Incremental  LR(1)  Parser 

The  extensions  to  the  incremental  parser  are  now  complete.  Some  other  attributes  are  ad- 
ded to  the  parse  tree  nodes  in  the  next  chapter,  and  used  to  support  editor  operations.  But  these 
attributes  are  not  required  for  incremental  parsing,  nor  are  they  maintained  or  referenced  by  the 
incremental  parser,  so  their  presentation  has  been  left  for  later  chapters,  so  that  only  the  essen- 
tial parser  extensions  could  be  discussed  in  this  chapter,  to  simplify  the  presentation. 

The  extended  incremental  LR(1)  parser  is  now  restated.  This  algorithm  now  handles  LR(1) 
grammars,  epsilon  productions,  comments,  multiple  syntax  errors,  and  partial  parses  (suspen- 
sions). The  next  chapter  discusses  the  editor /parser  interface  and  the  command  interpreter  of 


the  SAGA  editor. 
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5.11.1.  Routines  used  in  the  Parser 

alloc(): 

Allocate  a new  node  M. 

addr(M)  «-  M\ 

rdescend(M)  *—  M', 

ret urn(M). 

apply_jnatch: 

Let  A — *•  o be  the  reduction  for  which  the  matching  condition  holds. 

parentfstackfstacktop,  j))  *—  parentfirmark),  WO  < j < la  I; 

sibling  (stackfstacktop,  j))  •*—  stackfstacktop , j-l),  WO  < j < lev  | . 

sibling(stacktop)  «—  ni/. 

chainfM,  at,  M ): 

Let  M be  a node  in  the  frontier  of  T,  M a node  to  be  added, 
and  at  be  one  of  BEFORE  or  AFTER. 

if  at  = BEFORE  then  { 
next(U)  = addr(M); 
prev(M)  = prev(M)-, 
if  prevfM ) 7^  nil  then 

next(prev(M))  = addr(M); 
prevfM.)  = addr(M)', 

} 

if  at  = AFTER  then  { 
next(U)  = next(M); 
prevfM)  = addr(M)', 
if  next(M)  / nil  then 

prevfnext(M))  = addr(M); 
next(M)  = addr(M); 

}• 

exception(kind): 

Let  kind  be  either  ERROR  or  NOACTION,  according  to  whether  a syntax  error 
or  a suspension  has  occurred. 

Let  be  a marker  node,  to  note  the  point  of  error. 

Let  M be  the  incorrect  parse  node,  or  nil  if  a suspension  has  occurred. 

X = allocf); 

if  kind  = ERROR  then  { 
leftson(M)  = M; 
parent(R)  = addr(M); 
if  the  parser  is  in  section  (2)  then  { 
chainfM,  BEFORE,  activenode); 

chainfM ■,  BEFORE,  activenode ),  Wl  <j  < n,  where  the  M.  are  on  the 
nextusernode  list; 

parentfM ■)  = addrfM );  Wl  <j  < n. 

} (otherwise  we  are  reparsing  y,  and  M is  already  in  the  terminal  list) 
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} else  { 

leftson(M)  = nil; 

chain(M,  BEFORE,  activenode); 

} 

Ithread(M)  = stacktop ; 

if  nodetype(lthread(activenode ))  = TERM  or  nodetype(lthread(activenode)) 

= MARKER  then 

IthreadfM J = addr(M),  WMa,  where  lthread(MJ  = lthread( activenode). 

matchcond)): 

Let  A — ♦ a be  the  reduction  to  be  applied. 

if  irmark  = stackfstacktop,  j),  for  some  0 < j < I « I 
and  parent(irmark)  = parent(stack(irmark,  h .))  WO  < A < lev  I-  j 
and  parent(irmark)  parent(stack(irmark,  1 0.1  — j)) 
and  token(parent(irmark ))  — A 

and  rdescend(stacktop)  = rdescend(parent(irmark)) 
then 

re  turn(true); 

else 

return(false). 

nextsym)): 

Let  M be  a pointer  to  a parse  tree  node. 

Variable  savenode  is  set  in  routine  prepend  below. 

if  savenode  # nil  then  { 

M savenode ; 

savenode  *—  nil ; 

} else  if  nextusernode  y£  nil  then  { 

M *—  nextusernode ; 

nextusernode  «—  next(nextusernode); 

} else  if  nextnode  # nil  then  { 

while  no  detype  (nextnode)  = ( or  nodetype(nextnode)  — MARKER  do 
nextnode  *—  next(nextnode); 

M < — nextnode ; 

nextnode  +—  next(nextnode); 

} else  { 

M +—  alloc(); 

token) R)  «—  eof,  (the  end-of-file  token  code) 

} 

return) Jd). 

prepend(M): 

savenode  < — addr(R). 

reducefi,  M): 

Let  i be  production  A — *■  o,  and  M.  be  the  nodes  in  the  handle  to  be  reduced, 

1 < j < n,  n = \a\ . 


If  n = 0 then 
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n<-  I-, 

if  nodetype(Mj)  = MARKER,  VI  < j < n then  { 
exceptionfNO ACTION); 
return(l)] 

} 

parent(stack(stacktop,  j))  «—  addr^,  V0<j  < n; 

siblingfstackfstacktop , j))  stackfstacktop,  j - l),VO  < j < n; 

sibling  (stacktop)  +—  nil ; 

rdesccnd(M)  <—  rdescend( stacktop )\ 

token(M)  A; 

prepend(M)] 

return(O). 

shtftfM,  newstate): 

Ithread(M)  «—  sfac/rfop; 

4—  neu;-parse-s£afe; 
stacktop  <«—  addrp/j. 

unchain(M): 

prcv(next(N)  = prev(M); 
next(prev(M)  = next(M). 


5.11.2.  The  SAGA  Incremental  LR(1)  Parser 


parse(activcnode , deletecount,  nextuscrnodc,  parscoption): 


Let  Tbe  the  parse  tree  for  the  string  u;  = xzy. 

Let  z'be  a replacement  string  for  z,  and  w’  = xz’y  the  result. 

1.  Initialisation 

(1.1)  if  w # e (the  empty  string)  then  { 

M 4—  achvenode;  (the  first  symbol  in  zy) 
while  i/G  zdo  { (delete  z) 

>/«—  nex^.v); 
unchainfprev(M))] 

} 

activenode  •*—  >/; 
nextnode  ■*—  activenode, 

U *—  prev(M);  (reset  the  parser...) 
while  token(M)  — e or  token(M)  = commentcode  do 
.V  <—  prev(M)', 
irmark  *—  Ithread(M)', 
stacktop  *—  IthreadfM )\ 

if  nodetypefM ) = MARKER  then  { 
exc  ep  tio  n( N OA  CTION)', 
jump  to  (4.2) 
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} 

}• 

(1.2)  if  w = c (i.e.,  tu'is  being  parsed  from  scratch)  then  { 
irmark  B; 
stacktop  «—  B; 
nextnode  -4—  m7; 

>• 

2.  Analysis  of  z' 

(2.1)  .A/  <—  nextsym) ); 

action)pstate(stacktop),  token) H),  f,  new  value); 

Execute  (a),  (b),  (c),  (d),  (e)  or  (f)  according  to  the  value  of  /. 

(a)  / = SHIFT. 

if  the  symbol  to  be  shifted  is  activenode  then 
jump  to  (3); 

else  { 

shift)  M,  i); 

chain (M,  before,  activenode); 

}• 

(b)  / = REDUCE  i.  Let  i be  the  production  A — * a. 

if  irmark  — stackfM,  j)  for  some  0 <=  j < I a I (i.e.,  irmark  must  be  updated) 

then 

irmark  stack(M , I a | ); 

if  I a I > 0 then  { 
alloc)); 

if  reduce (i,  H)  = 1 then 
jump  to  (4.2); 

} else  { 

R «-  alloc (); 
token(>/)  «—  €; 
shift) R,  i); 

R^  alloc)); 

if  reducefi,  R)  = 1 then 
jump  to  (4.2); 

}• 

(c)  / = ERROR. 

exceptionfERR  OR ); 
jump  to  (4.2). 

(d)  f = ACCEPT. 

Jump  to  (4.1). 

(e)  / = NOACTION. 

chain) R,  before,  activenode). 


81 


(f)  / = SHIFTREDUCE. 

Execute  sections  (2.1a)  and  (2.1b)  above. 

3.  Analysis  of  y 

(3.1)  If  parseoption  = SUSPEND  then 

jump  to  (4.2). 

M «—  nextsym))-,  (Let  M be  the  node  which  stores  the  first  symbol  of  y.) 

old  table  «—  pstate(M); 
shift) M,  i)\ 

(3.2)  if  oldtable  pstate(stacktop)  then 

jump  to  (3.3); 

Otherwise,  skip  steps  of  the  analysis  of  y as  follows: 
while  sibling (stacktop)  # nil  do  { 

stacktop  «—  sibling(stacktop);  (we  enter  directly  in  a reduction  state). 

M ■*—  stacktop; 
j 

action(pstate)stacktop),  token(M),  f,  i);  (we  know  / = REDUCE  i,  » being 
production  A —*  o). 
if  matchcond  holds  then  { 

apply_match; 

accept  w’,  terminating  the  algorithm. 

if  irmark  = stackf stacktop,  j)  for  some  0 <=  j < I o | then 
irmark  •*—  stackfstacktop,  I a I ); 
oldtable  4—  pstate (parent (stacktop))-, 

if  parent(stack(stacktop,  j))  = parent(stack(stacktop,  k))  V 0 <=  j,  k < I cv  | then  { 
the  entire  subtree  of  T rooted  in  parent(stacktop)  is  reused: 

M 4—  parent) stacktop)’, 

Ithread(M)  4—  stackfstacktop,  I a I ). 

action(pstatefstacktop),  token) M),  f,  newvalue); 
pstate) M)  4—  newvalue. 

} else  { 

a new  node  is  allocated: 

U 4—  alloc))] 

if  reducefi,  M)  = 1 then 
jump  to  (4.2); 

} 

Jump  to  (3.2). 

(3.3)  M 4—  nextsym(input)] 
action(pstate)stacktop),  token(M),  f,  i)\ 

Execute  (a),  (b),  (c),  (d),  or  (e)  according  to  the  value  of  /. 

(a)  f = SHIFT. 

oldtable  4—  pstatefN)} 

shim  ih 

jump  to  (3.2). 
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(b)  / = REDUCE  i.  Let  t be  production  A — ► «; 
if  matchcond.  holds  then  { 

apply_jnatch; 

jump  to  (4.1); 

} ... 

if  irmark  = stack(stacktop , j)  for  some  0 <=  j < I a I then 

irmark  «—  stackfstacktop,  lo  1^; 

M <-  allocf) ; 

if  reducefi,  M)  = 1 then 
jump  to  (4.2); 
jump  to  (3.3). 

(c)  f = ERROR. 

exceptionfERR  OR )] 
jump  to  (4.2). 

(d)  f = ACCEPT. 

Jump  to  (4.1). 

(e)  f = NO  ACTION. 

(f)  f = SHIFTREDUCE. 

Execute  sections  (3.3a)  and  (3.3b)  above. 

4.  Termination 

(4.1)  status  = COMPLETE ; 
return(status). 

(4.2)  status  = SUSPEND ; 
return(status). 


5.12.  Summary 

In  this  chapter,  we  have  presented  the  editor’s  incremental  parser.  We  altered  the  attri- 
butes associated  with  the  parse  tree  node  to  make  the  parse  tree  suitable  for  use  with  an  editor. 
The  parsing  algorithm  was  extended  from  LR(0)  to  LR(1)  grammars.  It  also  has  been  extended 
to  support  grammars  containing  productions  with  empty  right  hand  sides. 

We  proposed  a new  way  to  handle  comments,  which  permits  their  general  use  in  language- 
oriented  editors,  as  in  text  editors,  resolves  their  storage  problem  in  parse  trees,  and  permits  uni- 
formity of  access  by  editor  commands  to  both  comments  and  syntactically  meaningful  tokens  in 


the  tree. 
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We  redefined  the  reduce  operation,  proposing  an  alternative  which  permits  the  parser  to 
treat  non-terminal  and  terminal  tokens  uniformly.  We  also  combined  the  parsing  action  and 
goto  function  into  a single  action.  Duplicate  code  was  eliminated,  improving  efficiency,  and  pro- 
vide support  for  the  editor  to  pass  sub-trees  to  the  parser. 

Our  error  handler  was  described,  which  permits  editing  throughout  the  parse  tree  in  the 
midst  of  multiple  errors,  and  the  editing  of  erroneous  text  which  has  not  yet  been  parsed.  We 
have  elected  to  provide  error  recovery,  not  error  correction,  since  it  supports  the  above  abilities 
while  letting  the  programmer  correct  his  own  errors,  which  we  found  to  be  simpler  for  both  the 
programmer  and  the  editor.  We  added  the  ability  to  perform  a partial  parse,  only  analyzing  the 
new  input,  and  then  suspending  the  parse  to  await  further  instructions.  This  feature  permits 
controlled  editing  which  takes  the  program  through  incorrect  intermediate  states,  improving  the 


flexibility  of  the  editor. 
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CHAPTER  6 

THE  SAGA  EDITOR 


The  SAGA  editor  has  been  designed  to  be  modular  and  retargetable  to  more  than  one 
language.  The  modularity  concentrates  the  language-dependent  code  in  a few  modules,  allowing 
most  of  the  source  code  to  be  re-used  intact  when  editors  are  built  for  new  languages.  It  also 
permits  experimentation  with  different  parser-generating  systems  for  a given  language,  so  that 
the  strengths  and  weaknesses  of  different  systems  can  be  compared.  A pictorial  breakdown  of  the 
editor’s  modular  structure  is  presented  in  Figure  6-1.  This  chapter  will  discuss  these  modules, 
and  their  interactions  with  one  another.  The  editor/parser  interface  is  discussed  first,  editor 
commands  next,  then  editor  interaction  with  other  development  tools,  and  finally  the  editor  in- 
terface to  the  file  system.  The  next  chapter  discusses  the  generation  of  editors  for  different 
languages. 

8.1.  The  Editor/Parser  Interface 

The  editor/parser  interface  consists  of  four  modules:  the  parse  tree  constructor  on  the  edi- 
tor side  of  the  interface,  and  the  lexical  analysis,  syntax  analysis  and  semantic  analysis  modules 
on  the  language— dependent  parser  side  of  the  interface.  The  parse  tree  constructor  implements 
the  incremental  LR(1)  parsing  algorithm  presented  in  the  previous  chapter.  Header  files  are  pro- 
vided for  each  of  the  analysis  modules;  any  parser-generating  system  which  produces  tables  for 
which  code  can  be  written  to  meet  the  requirements  of  this  interface  can  be  used  with  the  SAGA 


editor. 
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User 


Files 

Figure  8-1  s SAGA  Editor  Modular  Structure 


During  editing  and  language  analysis,  all  interaction  between  the  editor  and  the  parser  oc- 
curs through  two  routines:  tokenize  and  parse.  The  tokenize  routine  converts  a buffer  of  charac- 
ters into  a linked  list  of  terminal  nodes;  the  parse  routine  inserts  these  nodes  into  the  parse  tree, 
also  removing  any  nodes  which  are  being  deleted.  Calls  to  the  semantic  analyzer  are  embedded 
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within  the  incremental  parser.  A parser  initialization  routine  also  exists,  as  do  a few  other  rou- 
tines to  generate  the  follow  set  for  the  language  and  to  support  the  parse  tree  constructor. 

6.1.1.  Lexical  Analysis 

When  the  user  makes  a change  to  his  program,  the  input  handler  of  the  editor  constructs  a 
text  image  of  the  input  and  the  token  being  modified.  If  the  change  is  between  two  tokens  with 
no  intervening  space,  the  text  from  both  tokens  is  included  in  the  image.  The  lexical  analysis 
routine  tokenize  then  tokenizes  this  image.  If  the  input  spans  several  lines,  tokenize  is  called  on 
each  line  as  it  is  completed,  and  the  returned  nodes  are  appended  to  the  nextuaernode  list  being 
constructed.  The  change  may  cause  the  text  to  the  right  of  the  modification  to  be  re-examined, 
in  which  case  the  analyzer  may  need  to  request  further  input  from  the  input  handler  in  order  to 
properly  complete  its  task.  A lookahead  character  (the  character  on  the  screen  immediately  after 
the  text  image)  is  always  passed  to  the  analyzer,  which  it  may  use  to  decide  whether  it  requires 
further  input.  If  the  lookahead  character  is  not  part  of  the  current  token,  then  the  tokenizer  is 
finished,  and  returns  a list  of  parse  tree  terminal  nodes  which  represent  the  tokens.  Otherwise, 
the  tokenizer  returns  the  list  of  terminal  nodes  and  the  remaining  text  image,  with  a request  to 
be  called  again  with  further  input.  In  the  case  of  a matchfix  token  (such  as  a comment  or  string) 
which  has  not  yet  been  completely  recognized,  the  tokenizer  returns  and  the  input  handler  enters 
a “token  collection”  mode  in  which  the  user  can  skip  the  cursor  forward  to  include  existing  text 
in  the  new  matchfix  token.  The  user  can  then  insert  the  terminating  delimiter  at  an  appropriate 
point. 

The  interface  to  the  language-dependent  lexical  analysis  routines  is  defined  in  the  header 
file  lexfna.h,  summarized  in  Figure  6-2. 
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error  «—  lexinit(treeexists); 

nodelist  «—  tokenize  ( buffer , nextc,  lastc,  lookahead , addinput); 
where: 

addinput  A Boolean,  set  by  tokenize  to  request  more  input. 

buffer : A buffer  of  characters. 

lastc:  The  last  character  position  used  in  a buffer. 

lookahead : The  character  after  the  last  character  passed  to  tokenize . 

nextc:  The  first  character  position  used  in  a buffer. 

treeexists:  A Boolean,  set  to  true  if  an  existing  tree  is  being  edited. 

Figure  6-2:  Lexical  Analysis  Interface 


When  a lexical  error  is  encountered,  the  remainder  of  the  input  string  is  stored  in  a 
separate  parse  tree  terminal  node,  marked  as  an  unknown  token,  and  returned  along  with  any 
other  terminal  nodes  that  were  constructed.  The  calling  routine  can  still  make  further  calls  to 
the  lexical  analyzer,  until  all  input  has  been  lexically  analyzed.  These  nodes  will  still  be  passed  to 
the  parser  later. 

When  the  tokenizer  requests  further  input,  it  sets  addinput  to  true  and  returns  with  nextc 
set  to  the  first  character  not  included  in  any  token.  Its  caller  copies  the  characters  between  nextc 
and  lastc  back  to  the  beginning  of  the  buffer,  retrieves  the  text  representation  of  the  following  to- 
ken, appends  it  to  this  buffer,  and  marks  the  token  for  deletion.  It  calls  the  tokenizer  again,  and 
this  process  repeats  until  the  buffer  can  be  entirely  tokenized. 

By  completely  tokenizing  the  input  before  performing  any  parsing,  it  can  be  guaranteed 
that  text  read  from  an  input  file  will  become  a part  of  the  frontier  of  the  parse  tree  whether  or 
not  it  is  syntactically  correct,  without  requiring  additional  code  to  handle  this  situation  as  a spe- 
cial case.  The  implementation  is  simplified,  since  it  is  not  necessary  to  treat  a syntax  error  in  a 


88 


text  file  differently  from  a syntax  error  in  text  typed  by  the  user. 

6.1.2.  Syntax  Analysis 

After  the  lexical  analysis  is  complete,  the  parser  is  called  to  insert  the  new  nodes  into  the 
parse  tree.  The  parser  can  be  run  in  one  of  two  ways:  to  suspend  or  complete,  according  to  the 
command  given  by  the  user.  Normally  the  user  asks  the  parser  to  run  to  completion,  with  it 
reparsing  y after  it  has  finished  parsing  z\  where  z’  represents  the  new  input,  and  y the 
remainder  of  the  terminal  string  past  the  new  input.  However,  the  user  can  request  a partial 
parse,  which  causes  the  parser  to  suspend  parsing  after  analyzing  z\  and  before  any  reparsing  of 
y.  The  parser  also  will  suspend  whenever  it  encounters  a syntax  error,  regardless  of  the  parse  re- 
quested. A suspension  will  leave  the  parse  tree  with  a discontinuity,  but  with  the  state  and  local 
structure  saved  so  that  the  parse  can  be  resumed  later.  The  parser  also  can  process  deletions  us- 
ing  either  a full  or  partial  parse. 

Parser  suspension  permits  modifications  which  take  the  parse  tree  through  intermediate 
illegal  configurations,  as,  for  example,  when  a begin  and  distant  matching  end  symbols  are  be- 
ing inserted  or  deleted.  The  presence  of  this  option  greatly  increases  the  flexibility  of  editing 
operations,  since  the  user  can  make  a change  in  several  operations,  without  concern  for  maintain- 
ing syntactic  correctness  at  each  step. 

When  a syntax  error  is  encountered,  the  offending  token  (which  could  be  the  unknown  to- 
ken constructed  above)  is  highlighted  and  diagnostics  are  displayed.  All  new  terminal  nodes  are 
still  inserted  into  the  parse  tree  on  the  prev/next  list  discussed  earlier;  thus  they  may  be  accessed 
by  the  display  manager  even  when  they  cannot  be  successfully  parsed.  The  user  has  the  option  of 
repairing  the  error  immediately,  or  of  scanning  through  other  portions  of  the  program  and  possi- 
bly making  modifications  there  (needed,  for  example,  if  a begin  keyword  was  mistakenly  omitted 
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and  its  matching  end  just  encountered). 

Syntax  analysis  is  performed  in  the  parse  tree  constructor  and  syntax  analysis  modules  of 
the  editor.  The  interface  to  the  language-dependent  syntactic  analysis  routines  is  defined  in  the 
header  file  parsefns.h,  summarized  in  Figure  6-3.  The  parse  tree  constructor,  parse , is  the  incre- 
mental LR(1)  parser  presented  at  the  end  of  the  previous  chapter.  It  communicates  with  the 


var  Vpgenname:  alfa;  Name  of  the  parser-generator  used. 

Vpgenrev:  alfa;  Version  of  the  parser-generator. 

Vlangname:  alfa;  Name  of  the  language  recognized. 

Vlangrev:  alfa;  Version  of  the  language  (grammar  spec.). 

error  «—  initparser(treeexists); 

action  (tokencode,  state,  f,  new  value); 

legalnonterm  (state,  stackptr,  tokenlist,  length); 

legalterm  (state,  stackptr,  tokenlist,  length); 

nametokencode  (tokencode,  buffer,  lastc); 

tokencode  «—  ruleleftside  (rulenumber); 

length  «—  rule  length  (rulenumber); 

where: 

buffer.  A buffer  of  characters. 

fi  The  parsing  action  returned. 

lastc : The  last  character  position  used  in  a buffer. 

length:  The  number  of  items  in  a returned  list. 

rulenumber.  A production  rule  number. 

state:  The  current  parse  state. 

stackptr.  A pointer  to  the  node  on  the  top  of  the  parse  stack. 
tokencode:  The  code  (integer)  assigned  to  a token  of  the  grammar. 
tokenlist  An  array  of  token  codes. 

treeexists:  A Boolean,  set  to  true  if  an  existing  tree  is  being  edited. 

Figure  6-3:Interface  to  Syntax  Analysis  Routines 
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parser  decision  routines  through  an  interface  which  supports  the  basic  shift-reduce  parsing  algo- 
rithm. This  interface  permits  the  use  of  different  parsers  from  a variety  of  parser-generating 
systems  in  the  construction  of  SAGA  editors.  Any  parser  generating  system  may  be  used  if  the 
resulting  parser  and  its  tables  can  support  the  functions  required  by  the  interface.  Since  different 
parser-generators  have  different  capabilities,  this  permits  us  to  choose  a generator  best  suited  for 
a particular  language. 

The  parse  routine  takes  an  editing  location,  a count  of  the  number  of  nodes  to  be  deleted, 
the  list  Of  nodes  to  be  inserted,  and  the  parsing  option  auspend/complete.  It  in  turn  calls  a rou- 
tine  action , on  the  language-dependent  side  of  the  interface,  in  the  syntactic  analysis  module. 
The  action  routine  is  passed  the  current  parse  state  and  a token  code;  it  uses  these  values  to  in- 
dex into  a set  of  parse  tables  to  produce  a parsing  action  and  either  a new  parse  state  or  a pro- 
duction rule  number,  according  to  the  parsing  action. 

Two  routines,  ruleleftside  and  rulclcngth , are  defined  which  take  a production  rule  number 
and  return  the  token  code  of  the  token  on  the  left  hand  side  of  the  rule,  and  the  length  of  the 
rule,  respectively.  These  routines  are  used  by  the  reduce  routine  described  earlier  to  obtain  the 
necessary  information  about  the  production  A — ► <x  to  be  used  in  the  reduction. 

Routines  legalterm  and  legalnonterm  pass  in  the  current  parse  state  and  value  of  the  stack- 
top  variable,  and  expect  to  receive  a list  of  terminal  or  non— terminal  token  codes.  These  codes 
are  used  to  construct  the  follow  set  to  be  displayed  and  to  determine  whether  a non— terminal 
node  can  be  inserted  at  a given  spot  in  the  terminal  list. 

Routine  nametokencode  passes  in  a token  code  and  expects  to  receive  a text  string  which  is 
the  printable  form  of  the  token.  If  the  token  code  is  a reserved  word  or  a special  symbol  (opera- 
tors and  punctuation),  then  that  reserved  word  or  symbol(s)  should  be  returned.  If  the  token 
code  is  a generic  class,  such  as  an  identifier  or  constant,  then  that  identifier  or  constant  should  be 
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retrieved  from  the  string  table  and  returned.  If  the  token  code  is  a non-terminal  token,  then  the 
text  string  used  to  name  this  token  should  be  retrieved  if  available,  or  the  production  rule 
number  enclosed  in  angle  brackets  otherwise.  This  value  is  most  often  used  in  debugging  traces, 
but  could  also  be  used  to  display  a non-terminal  follow  set  (for  a language  designer,  for  example) 
or  as  a printable  name  for  a place  holder  for  an  unexpanded  or  elided  sub-tree. 

Lastly,  a call  to  a routine  initparaer  is  provided  which  passes  in  a flag  indicating  whether  an 
editing  session  is  beginning  with  a new  or  preexisting  file;  this  routine  should  contain  code  to  load 
the  parse  tables  and  initialize  any  internal  data  structures  to  be  used  by  the  other  routines  in  this 
module.  It  also  should  initialize  several  character  strings  which  are  used  to  display  the  version  of 
the  language  and  parser-generating  system  in  use.  Tables  from  any  parser-generator  can  be 
used  as  long  as  access  code  to  produce  the  return  values  from  these  initial  values  can  be  written. 

6.1.3.  Semantic  Analysis 

Support  is  provided  for  semantic  analysis  to  be  performed  through  a syntax-directed 
analysis  scheme.  The  interface  to  the  semantic  analysis  routines  is  defined  in  the  module 
semanfns.h,  summarized  in  Figure  6-4.  As  the  parser  shifts  and  reduces  parse  tree  nodes,  it  calls 
semantic  evaluation  routines  in  the  semantic  analysis  module.  The  semantic  routines  can  either 
evaluate  the  changes  as  they  are  made,  or  can  record  the  changes  as  the  parser  runs  and  perform 
the  actual  evaluation  after  the  reparse  has  completed.  When  a semantic  error  is  detected,  a se- 
mantic error  flag  is  set  in  the  node,  and  that  token  is  displayed  in  highlighted  form.  Since  se- 
mantic errors  do  not  affect  the  integrity  of  the  parse  tree,  there  is  no  impact  upon  the  incremen- 
tal parser.  The  user  can  repair  the  error  when  convenient. 

Calls  to  these  routines  are  placed  into  the  incremental  parsing  algorithm,  to  automatically 
invoke  these  routines  whenever  a parse  occurs.  Any  style  of  semantic  analysis  which  can  be  per- 
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evalinit  (treeexists);  Beginning  to  edit  a new  file. 

evalstart  (stacktop,  state);  An  incremental  reparse  is  beginning. 

evalcontinue  (tag);  Parser  has  reached  a suspension  point. 

Tag  was  returned  by  evalsuspend  earlier. 


evaldelete  (activenode,  count); 
evalshift  (stacktop); 

evalreduce  (stacktop,  rulenumber,  parent); 

evalphase2  (y first:  nodeindex); 
tag  « — evalsuspend  (stacktop,  state); 

error  «—  evalfinish  (subtreeroot); 

error  «—  tv  aldose; 

error  «—  eval  (tree,  activenode); 
evalerror  (activenode,  buffer,  lastchar); 

evaldebug  (tree,  activenode); 


Delete  count  nodes,  beginning  at  activenode. 

A node  has  just  been  shifted. 

About  to  reduce  stacktop  by  rulenumber 
to  parent. 

Beginning  2nd  phase;  the  reparse  of  y. 

Suspending  a parse;  tag  will  be  passed  to 
evalcontinue  later. 

Completed  a parse;  error  set  if  any  semantic 
errors. 

Ending  an  editing  session;  error  set  if  cannot 
save  semantic  data. 

Called  by  editor  eval  command. 

Return  semantic  error  message  for  termnode 
in  buffer '.lastchar. 

Called  by  editor  evaldebug  command. 


where: 

activenode:  parse  tree  node  on  which  the  editing  cursor  is  positioned. 

buffer:  contains  message  describing  semantic  error. 

lastchar:  length  of  message  in  buffer. 

parent,  node  to  be  parent  after  reduction  is  made. 

rulenumber:  production  rule  number  used  in  this  reduction. 

stacktop:  a pointer  to  the  node  on  the  top  of  the  parse  stack. 

state:  the  current  parse  state. 

subtreeroot.  non-terminal  node  at  which  incremental  reparse  terminated. 

tag:  an  integer  returned  by  evalsuspend ; passed  to  evalcontinue  later. 

tree:  a pointer  to  the  header  record  for  the  parse  tree. 

treeexists:  set  to  true  if  the  parse  tree  already  exists. 

yfirst.  the  first  old  terminal  node  after  the  new  input  nodes. 

Figure  6-4:  Interface  to  Semantic  Analysis  Routines 


formed  using  the  values  made  available  through  these  calls  can  be  supported.  In  particular,  the 
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semantic  analysis  may  be  performed  either  in  step  with  the  syntactic  analysis,  in  a separate  pass 
over  parts  of  the  tree  after  the  syntactic  analysis  has  completed,  or  as  a separate  process  running 
in  parallel  with  the  editor.  Since  semantic  analysis,  even  when  performed  incrementally,  takes  a 
significant  amount  of  time  to  complete,  analysis  can  be  deferred  and  performed  only  when 
specifically  directed  by  the  user. 

Routine  evalinit  is  called  during  editor  initialization,  to  permit  the  semantic  evaluator  to 
initialize  its  data  structures,  and  start  up  its  own  process  if  one  is  desired.  Evalstart  is  called 
whenever  a new  parse  begins,  except  if  one  is  beginning  at  a suspension  point,  when  evalcontinue 
is  called  instead.  Evalcontinue  is  also  called  each  time  the  parser  reaches  a discontinuity  in  the 
nextnode  list.  Evaldelete  is  called  before  the  parser  actually  modifies  the.  tree;  it  is  passed  both 
activenode  and  deletecount  which  indicate  the  nodes  to  be  deleted,  to  permit  the  semantic 
analysis  routines  to  nullify  any  synthesized  attributes,  if  required.  Each  time  the  parser  shifts  a 
node,  evalahift  is  called  with  the  atacktop  variable  after  the  operation  is  complete.  Each  time  the 
parser  performs  a reduction,  evalreduce  is  called  with  atacktop  and  the  new  parent  node  after  the 
parent  and  its  children  are  linked  together,  but  before  the  reduction  is  performed,  to  make  access 
to  the  children  easier.  When  the  parser  completes  parsing  the  new  input  string  z’  and  begins 
reparsing  y,  already  present  in  the  old  tree,  evalphaae2  is  called  to  indicate  the  parser  has  entered 
the  next  phase  of  the  parse. 

If  the  parse  completes  normally,  evalfiniah  is  called;  otherwise  evalauapend  is  called,  and  the 
integer  value  returned  by  it  is  saved  in  the  marker  node  to  be  passed  to  evalcontinue  when  this 
discontinuity  is  later  reached.  At  the  end  of  an  editing  session,  evalcloae  is  called  to  permit  any 
data  kept  in  memory  to  be  written  to  disk,  and  to  terminate  the  separate  semantic  analysis  pro* 
cess,  if  one  was  begun. 
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Three  other  semantic  analysis  routines  exist  which  are  tied  to  editor  commands.  Eval  is 
called  when  the  “eval”  command  is  executed;  it  provides  support  for  editing  commands  which  ac- 
cess the  symbol  table  produced  during  the  semantic  analysis.  Evalerror  corresponds  to  the 
“evalerror”  command,  is  called  with  the  address  of  a node  containing  a semantic  error,  and  re- 
turns an  error  message  to  be  displayed  for  the  user.  Lastly,  evaldebug  is  linked  to  the  editor  com- 
mand of  the  same  name,  and  provides  an  entry  point  to  the  semantic  analysis  routines  to  support 
interactive  debugging,  such  as  display  of  the  data  structures  used  by  the  semantic  analysis  rou- 
tines. 

These  routine  skeletons  are  provided  by  the  editor  for  the  language  implementor  to  enable 
him  to  interface  the  editor  to  a parser-generator  of  the  implementor’s  choice,  so  that  different  se- 
mantic analysis  techniques  can  be  tried. 

The  SAGA  group  is  presently  studying  incremental  semantic  analysis  and  building  an  attri- 
bute evaluator  for  languages  specified  by  regular  right  part  LR(1)  grammars  [Beshers,  84]  using 
maintained  and  constructor  attributes  [Beshers  and  Campbell,  85],  An  independent  investigation 
into  semantic  analysis  based  upon  the  CFF/AML  system  designed  by  Kaplan  [Kaplan,  85]  is  also 
beginning.  A non— incremental  semantic  evaluator  was  recently  produced  [Kimball,  85]  to  pro- 
vide support  for  a code  generator  to  produce  object  code  directly  from  the  parse  trees  construct- 
ed by  the  editor.  Further  reports  about  semantic  analysis  schemes  should  appear  in  future  Ph.D. 
dissertations  and  Master’s  theses  by  other  members  of  the  SAGA  research  group  as  this  related 
research  matures. 

6.2.  The  Command  Interpreter 

The  user  of  a SAGA  editor  inputs  his  program  in  free  format  from  the  keyboard;  templates 
are  not  required,  and  no  non-terminals  appear  on  the  screen.  The  editor  is  screen-oriented;  the 
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user  positions  the  cursor  at  any  point  within  the  file  of  text  on  the  screen  and  inserts,  replaces  or 
deletes  text  directly  at  that  position;  the  input  is  tokenized,  parsed,  and  inserted  into  the  pro- 
gram display  window.  At  any  time  during  the  editing  process,  the  user  can  request  that  the  edi- 
tor print  the  set  of  legal  tokens  (the  follow  set)  that  can  be  inserted  at  the  cursor  position.  The 
user  also  can  select  more  complex  editor  commands  by  using  the  command  mode  of  the  editor, 
which  temporarily  displays  commands  at  the  bottom  of  the  screen.  As  such  commands  are  exe- 
cuted, the  screen  is  updated  immediately  to  display  the  changed  text.  Editing  commands  enable 
the  user  to  insert,  delete,  move,  copy,  or  replace  arbitrary  fragments  of  text.  These  fragments 
can  be  selected  by  cursor  positions,  characters,  strings,  lines,  syntactic  constructions  and  eventu- 
ally by  semantic  constructions  within  the  text.  For  example,  in  a Pascal  program,  a user  may 
select  an  if  ...  then  ...  else  ...  statement,  discard  the  else  ...  part,  and  copy  the  remaining  frag- 
ment to  another  location. 

0.2.1.  Basic  Commands  Capabilities 

Since  the  user’s  text  is  parsed  and  stored  in  parse  tree  form,  it  is  possible  to  take  advantage 
of  this  structure  through  structure-oriented  commands  which  specify  operations  in  terms  of  to- 
kens or  sub-trees.  But  more  significantly,  unlike  template  driven  syntax-directed  editors,  which 
constrain  editing  to  limited  sub-tree  replacements,  by  basing  the  SAGA  editor  upon  an  incre- 
mental parser  and  permitting  free-form  input,  it  is  possible  to  retain  the  text-oriented  com- 
mands that  manipulate  characters  and  lines  as  well. 

During  the  execution  of  an  editing  modification,  the  editor  communicates  with  the  parser 
through  two  routines:  tokenize,  which  converts  characters  to  terminal  nodes,  and  parse,  which 
integrates  these  nodes  into  the  parse  tree.  All  commands  which  modify  the  text  are  executed 
through  this  interface.  The  basic  modification  operation  provided  by  the  parser  is  to  delete 
and/or  insert  a sequence  of  tokens  at  an  arbitrary  token  position  along  the  frontier  of  the  parse 
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tree. 

It  is  possible  to  extend  the  editor/parser  interface  to  permit  the  deletion  and/or  insertion  of 
an  arbitrary  sequence  of  characters,  at  an  arbitrary  character  position  along  the  frontier  of  the 
tree,  by  defining  a data  structure  consisting  of  a buffer  of  characters  modbuf,  a pointer  deletenode 
to  a node  to  be  deleted,  and  a deletion  count  deletecount, 

When  a sequence  of  characters  beginning  in  the  middle  of  a token  is  to  be  deleted,  the  ad- 
dress of  the  node  containing  the  token  in  which  the  sequence  starts  is  assigned  to  deletenode,  and 
deletecount  is  set  to  1.  The  characters  from  the  beginning  of  the  token,  up  to  but  not  including 
the  first  one  to  be  deleted,  are  copied  into  the  beginning  of  modbuf.  Then  deletecount  is  incre- 
mented for  each  additional  token  that  corresponds  to  the  characters  to  be  deleted.  If  new  charac- 
ters will  also  be  inserted,  these  characters  are  appended  to  modbuf,  which  is  tokenized  each  time 
it  contains  a complete  line  of  input.  At  the  end  of  the  insertion,  any  characters  in  the  last  token 
to  be  deleted  which  are  not  in  the  character  string  to  be  deleted  are  copied  to  the  end  of  modbuf 
before  it  is  tokenized. 

Now  the  parser  is  called,  with  activenode  set  to  deletenode,  deletecount  supplying  the 
number  of  nodes  to  be  deleted,  the  nextusernode  list  supplying  the  nodes  to  be  inserted,  and  par- 
seoption  set  to  the  user’s  choice  as  to  whether  the  parse  should  suspend  or  complete  once  the  im- 
mediate modification  is  complete.  The  parser  integrates  the  new  input  into  the  parse  tree,  treat- 
ing any  errors  as  discussed  earlier. 

Since  modifications  can  be  permitted  from  any  character  position  to  any  other  character 
position,  it  is  straightforward  to  provide  modifications  on  any  integral  number  of  tokens,  lines,  or 
sub-trees,  as  long  as  a mechanism  is  provided  to  the  user  to  specify  these  other  types  of  units. 
All  other  editor  commands  are  constructed  upon  this  basic  mechanism,  by  decomposing  more 
complex  editing  operations  into  sequences  of  this  basic  modification. 
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6.2.2.  Screen  Mode 

When  invoked  in  screen  mode,  the  SAGA  editor  displays  a screen  full  of  text  (parse  tree 
terminal  nodes),  and  positions  the  terminars  cursor  on  the  first  node  displayed.  The  user  posi- 
tions this  cursor  and  selects  sections  of  the  tree  to  be  acted  upon  by  an  editor  command.  Editor 
commands  are  single  control  characters;  a line-mode  escape  jumps  the  cursor  to  the  bottom  row 
of  the  screen  to  permit  a line-mode  editor  command  to  be  typed.  The  control  characters  are  tied 
to  the  basic  line-mode  commands,  or  sequences  of  these  commands.  A map  table  is  planned  as  a 
future  extension  which  will  enable  the  user  to  customize  the  editing  interface. 

To  insert  text  in  screen  mode,  it  is  only  necessary  to  position  the  cursor  at  the  point  of  the 
insertion  and  then  directly  type  the  characters  to  be  inserted.  All  non-control  characters  are 
treated  as  data,  and  are  placed  into  the  text  buffer  to  be  tokenized  and  parsed.  Once  a partial 
line  of  input  text  has  been  typed,  single  characters  may  be  erased  by  typing  a backspace 
(control-H),  and  the  entire  line  of  new  input  erased  by  typing  control-U.  Once  a newline  (or  car- 
riage return)  is  typed,  the  input  line  is  immediately  tokenized,  and  queued  for  parsing.  Each  new 
line  of  input  is  treated  in  the  same  way.  No  (syntactic)  parsing  is  actually  done  until  the  input  is 
terminated  via  an  escape  character.  Alternatively,  The  user  may  request  a partial  parse  by  ter- 
minating  the  input  with  a control-P  instead.  At  this  point,  the  parse  is  performed,  and  any  lexi- 
cal, syntactic,  or  semantic  error  highlighted  on  the  screen.  The  user  may  repair  the  error  right 
away,  scroll  through  other  parts  of  the  file,  make  another  editing  change  before  the  point  of  the 
error,  or  exit  the  session  (to  repair  the  error  in  a future  session). 

0.2.3.  Line  Mode 

The  line  mode  command  syntax  has  the  following  form: 

l arguments ) command  (parameters] 
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Argument  Type 

Syntax 

character 

n{  c ! C } 

integer 

n 

Count  line 

» { 1 ! L } 

subtree 

n{s\ S} 

token 

»{t!T) 

character 

Position  range 

®'A 

sub-tree 

n‘ 

tree  node 

a# 

String 

" b tring-of-eharae  te  r*  ’ 

where:  n is  an  integer,  *,  or 

lt,  lt  are  single  letters  that  name  editor  pointers,  or  if  absent,  the  terminal's  cursor. 

* stands  for  maxint,  the  maximum  integer  value  permitted  on  the  system. 

— * stands  for  -maxint 

a is  the  address  of  a parse  tree  node  (an  unsigned  integer). 

Table  6-1:  Editor  argument  types  and  their  syntax. 

Only  the  command  name  is  required,  and  only  as  many  characters  as  necessary  to  disambiguate 
it  from  other  commands.  The  preceding  arguments  generally  specify  a section  of  the  parse  tree 
to  be  acted  upon  by  an  editing  command.  These  arguments  are  evaluated  by  the  editor’s  com- 
mand interpreter,  and  placed  onto  an  argument  stack  before  the  command  is  invoked.  Argu- 
ments only  apply  to  the  command  they  directly  precede,  unless  parentheses  are  used  to  distribute 
them  across  several  commands.  Not  all  commands  take  all  argument  types;  legal  ones  are  listed 
with  each  command,  while  illegal  ones  simply  cause  an  unexpected  argument  error  message  to  be 
displayed.  In  general,  commands  take  all  argument  types  which  “make  sense”  for  that  com- 
mand. 

The  trailing  parameters  specify  additional  arguments  specifically  for  that  command,  and 
that  command  only.  Unlike  preceding  arguments,  trailing  parameters  are  not  evaluated  before 
the  command  is  invoked,  but  are  placed  on  the  argument  stack  as  a string  of  characters.  This  is 
especially  useful  for  the  filter  command,  which  executes  a separate  process  specified  by  the  user, 
passing  to  it  these  parameters  as  command  line  arguments. 
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In  addition  to  the  predefined  commands,  the  user  may  also  define  new  commands  as  se- 
quences of  already  existing  editor  commands.  This  mechanism  provides  a convenient  way  to  ex- 
periment with  composite  commands.  Commands  which  are  found  to  be  particularly  useful  may 
be  added  to  the  basic  command  set  for  improved  execution. 

Nine  types  of  arguments  are  presently  recognized:  integers;  counts  of  characters,  tokens, 
lines,  trees;  a character  position;  a range  (a  pair  of  character  positions);  a sub-tree  root  position; 
and  a character  string.  Counts  are  all  relative  to  the  location  of  the  editing  cursor,  positions  are 
at  a specific  tree  location,  integers  are  interpreted  as  appropriate  to  the  command,  and  character 
strings  represent  search  strings,  file  names,  and  so  on.  The  argument  types  and  their  syntax  are 
given  in  Table  6-1. 

6.2.4.  Predefined  Commands 

The  editor’s  predefined  commands  may  be  grouped  according  to  function:  positioning  com- 
mands, modification  commands,  formatting  commands,  informational  commands,  control  com- 
mands, and  environmental  commands.  Table  6-2  presents  the  argument  types  permitted  with 
each  command.  Each  of  these  command  groups  is  described  below. 

6.2.4.I.  Positioning  Commands 

The  positioning  commands  move  the  editing  cursor  through  the  text  displayed  on  the 
screen  (and  through  the  frontier  of  the  parse  tree),  and  also  place  auxiliary  editing  pointers  into 
the  parse  tree  for  later  reference.  There  are  four  commands:  back  and  forward  for  cursor  posi- 
tioning, and  set  and  clear  for  auxiliary  pointer  placement.  Each  of  these  commands  corresponds 
to  a line  mode  command;  in  screen  mode,  characters  can  be  mapped  to  either  specific  commands 
or  specific  argument/ command  pairs,  so  that  move-by-char,  move-by-token,  move-by-line  and 
move-by-tree  commands  can  be  made  single  key  strokes,  appearing  as  individual  commands 
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Table  6-2:  Basic  editor  commands  grouped  by  function 
showing  the  argument  types  permitted  with  each  one. 


although  they  are  actually  a single  command  invoked  with  different  arguments. 
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S.2.4.2'  Modification  Commands 

Modification  commands  change  the  text  of  terminal  tokens  and/or  the  structure  of  the 
parse  tree.  These  commands  are:  delete,  partial- delete,  insert,  partial-insert,  fread,  f write  and 
parse . The  delete  and  insert  commands  request  a complete  parse,  fread  corresponds  to  insert  tak- 
en from  a text  file,  and  parse  invokes  the  parser  at  a particular  location  to  remove  a suspension 
point  left  by  some  earlier  suspended  parse. 

6.2.4.3*  Formatting  Commands 

Formatting  commands  rearrange  the  display  of  the  text  on  the  screen  by  altering  the 
number  of  newlines  and  spaces  between  terminal  tokens.  They  differ  from  modification  com- 
mands in  that  neither  the  terminal  token  text  nor  the  parse  tree  structure  is  altered,  so  that  the 
parser  is  not  invoked.  Reformatting  which  would  cause  two  tokens  to  be  reevaluated  into  one  is 
prohibited;  a deletion  command  instead  is  required  to  remove  the  intervening  spaces. 

6«2.4'4»  Informational  Commands 

These  commands  provide  assistance  to  the  user.  Four  are  pre-defined:  help,  error,  follow- 
set  and  print  The  help  command  lists  the  editor  commands  and  available  help  topics;  help  key - 
word  provides  more  specific  help  about  the  command  or  topic  supplied  in  keyword.  The  error 
command  displays  an  error  message  for  the  highlighted  token  under  the  editing  cursor.  Only  lex- 
ical error,  syntax  error  and  semantic  error  are  provided  by  the  editor  by  default;  customized  code 
must  be  written  for  one  of  the  language-dependent  modules  or  a filter  process  in  order  to  provide 
more  language-specific  diagnostics. 

The  follow-set  command  asks  the  editor  to  display  the  set  of  legal  tokens  which  could  be  in- 
serted just  before  the  token  on  which  the  editing  cursor  is  positioned.  It  can  be  a valuable  diag- 
nostic for  a user  with  a non-obvious  syntax  error  to  repair,  for  a language  implementer  to  verify 
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his  parse  tables  interactively,  and  to  acquaint  a programmer  with  a new  language.  The  print 
command  is  only  necessary  when  the  editor  is  run  in  line  mode,  to  list  portions  of  text. 

6.2.4. 5.  Control  Commands 

The  control  commands  affect  the  editor’s  internal  environment,  setting  options  and  control- 
ling command  definition  and  execution.  These  commands  consist  of  parentheses,  loop,  define, 
exec,  off,  on  and  quit  The  parentheses  command,  specified  with  a pair  of  parentheses,  groups 
several  commands  together  to  distribute  arguments  or  perform  an  iteration.  The  loop  command 
executes  the  following  command  until  failure;  applied  to  parentheses,  it  iterates  over  a sequence 
of  commands  until  one  produces  an  error  return  (e.g.  forward  when  positioned  at  the  end  of  the 
parse  tree). 

The  define  command  associates  a name  with  a sequence  of  editor  commands;  use  of  this 
name  as  a command  invokes  this  command  sequence.  The  exec  command  takes  a file  name 
string  argument  and  reads  and  executes  the  editor  commands  specified  in  the  file.  It  is  used  to 
define  and  execute  commands  in  a file  during  editor  initialization,  and  to  pass  command  se- 
quences from  a filter  process  back  to  the  editor  for  execution. 

The  off  and  on  commands  take  several  keyword  arguments  and  set  or  clear  corresponding 
Boolean  variables  in  the  editor  which  control  its  behavior.  These  commands  are  mostly  used  to 
interactively  toggle  debug  and  trace  variables  to  monitor  ad  measure  editor  execution.  Lastly, 
quit  terminates  an  editor  session. 

6.2.4.8.  Environmental  Commands 

The  environmental  commands  affect  the  editor’s  external  environment,  such  as  its  interface 
to  the  file  system  and  other  processes  that  are  running  on  the  system.  Four  are  presently 
defined:  sh,  csh,  filter  and  make.  Both  sh  and  csh  pass  their  arguments  out  to  UNIX  cshell  and 
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shell  programs  for  execution.  When  run  in  screen  mode,  results  of  these  programs  can  be  placed 
into  a pop-up  window  on  the  user’s  terminal. 

The  filter  command  executes  the  named  process,  passing  it  the  name  of  the  file  currently- 
being  edited,  an  optional  sub-tree  root  or  token  range,  and  any  other  command-specific  parame- 
ters given  on  the  command  line.  Filter  processes  greatly  increase  the  power  of  the  editor  by  pro- 
viding a modular  way  perform  analyses  on  existing  parse  trees.  More  will  be  said  about  filter 
processes  later  in  this  chapter. 

0.2.5.  User-Defined  Commands 

Given  the  basic  command  set  described  above,  a number  of  additional  commands  can  be 
defined  and  included  in  all  editors  to  provide  increased  functionality.  A copy  operation  can  be 
defined  by  pick  and  put  commands: 

define  pick  fwrite  tempfile 

define  put  fread  tempfile 

By  splitting  • copy  into  two  components,  only  one  location  need  be  specified  at  an  instant,  simpli- 
fying the  operation  since  the  user  does  not  need  to  keep  both  the  source  and  destination  locations 
in  mind  at  the  same  time.  An  additional  variant  can  be  created  which  picks  up  and  deletes,  to 
perform  a move  operation. 

Commands  such  as  move  to  the  end  of  the  line  can  be  constructed  from  the  predefined  move 
1 line  forward ; move  1 character  back.  Any  of  these  can  be  used  in  screen  mode  by  assigning  a 
control  character  to  the  user-defined  command. 

Command  extensibility  permits  us  to  try  out  different  command  combinations  easily,  and 
permits  language-dependent  operations  to  be  defined  for  each  different  type  of  editor.  Advanced 
language-dependent  operations,  such  as  tree  transformations,  can  be  performed  through  a 
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separate  process  tied  to  the  editor,  and  invoked  through  a user-defined  command.  The  extensi- 
bility supports  customization  of  the  editor  toward  a specific  development  environment,  and 
makes  many  more  resources  available  to  the  user. 

8.2.6.  Screen  Management 

The  SAGA  editor  employs  the  Maryland  Window  Package  [Torek,  83]  as  its  screen 
manager.  This  package  references  the  /etc/termcap  terminal  capability  file  available  on  UNIX 
systems  to  determine  the  characteristics  of  the  terminal  in  use.  The  package  supports  the  de- 
claration of  a text  buffer  and  associated  window  into  that  buffer,  with  the  window  placed  on  some 
portion  of  the  terminal  screen.  Multiple,  overlayed  windows  are  supported,  and  the  package  runs 
an  algorithm  to  detect  moved  blocks  of  text  as  well  as  isolated  modifications,  and  attempts  to 
send  a minimal  number  of  characters  to  the  terminal  to  update  the  display  to  correspond  with  its 
internal  text  image. 

The  package  provides  a flexible  environment  for  the  editor,  which  uses  the  overlaid  window 
capability  for  pop— up  windows  that  contain  information  such  as  a terminal  follow  set  or  output 
from  a filter  process  run  from  within  the  editor. 

6.2.7.  Invoking  the  Editor 

The  editor1  is  invoked  with  a command  of  the  form: 

epos  [options]  <name> 

Options  are  single  letters  preceded  by  a minus  sign,  and  the  names  are  SAGA  directories  contain- 
ing structured  files.  The  editor  can  be  run  in  either  screen-mode  or  line-mode,  depending  on  the 

lrnie  SAGA  editor  has  been  tentatively  named  epos,  until  a more  suitable  name  is  found.  Webster’s 
dictionary  defines  epos  as  “epic  poetry”,  appropriate  since  the  SAGA  project  is  investigating  software 
development  and  the  software  life-cycle  for  full  programming  languages  and  grammars,  not  just  simple 
subsets;  and  also  as  “an  epic  poem,  handed  down  by  word  of  mouth”,  appropriate  for  the  earlier  days  of 
the  editor  development,  since  new  features  became  available  some  time  before  they  became  documented! 
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terminal  in  use  and  the  user’s  preference.  Epos  <name>  attempts  to  run  the  editor  in  screen 
mode,  and  failing  that,  uses  line  mode.  Epos  -/  <name>  forces  the  editor  to  use  line  mode,  re- 
gardless of  the  terminal’s  screen  capabilities. 

The  <name>  argument  is  used  to  create  a directory  which  will  contain  the  files  of  struc- 
tured data  (parse  tree,  symbol  table,  object  code  library,  etc.)  that  will  be  produced  by  the  edi- 
tor. If  already  existent,  <name>  must  be  a directory  containing  the  structured  data  files  from 
an  earlier  editing  session.  Files  in  directory  <name>  should  only  be  modified  by  SAGA  pro- 
grams, and  only  files  created  by  SAGA  programs  should  reside  there.  (During  program  execu- 
tion, a number  of  temporary  files  of  varying  names  are  created,  and  name  collisions  are  possible.) 

Although  actually  a directory,  <name>  can  be  thought  of  conceptually  as  a file  containing 
structured  information,  and  since  the  user  need  not  be  concerned  with  the  actual  organization  of 
the  information  in  this  directory,  it  will  be  referred  to  as  a file  throughout  this  section. 
<Name>  must  have  been  produced  by  a SAGA  program  recognizing  the  same  language  as  the 
editor  being  invoked. 

6.3.  Filter  Processes 

The  SAGA  editor  provides  a mechanism  by  which  separate  processes  can  be  invoked  during 
an  editing  session  to  traverse  portions  of  the  parse  tree  being  edited.  These  processes,  termed 
filter  processes , read,  analyze  and  possibly  transform  the  parse  tree,  returning  the  result  to  the 
editor.  By  defining  new  commands  with  the  editor’s  user-defined  command  facility,  which  in- 
voke filter  processes,  authors  of  filters  can  provide  complex  operations  as  simple  commands.  A 
tree  plotter,  diagrammer,  compactor,  rule  frequency  counter,  pretty  printer,  and  a Pascal  tree 
transformation  program  have  already  been  written  using  this  facility. 
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Since  the  editor  constructs  a parse  tree,  it  is  a simple  matter  to  make  this  tree  available  for 
additional  analysis  by  other  programs.  These  programs,  using  pre-defined  library  routines,  walk 
the  parse  tree  collecting  data.  They  can  modify  some  fields  in  the  tree  directly,  and  can 
transform  the  structure  of  the  tree  by  writing  a text  file  to  be  passed  back  to  the  editor  to  be 
parsed  and  inserted  in  place  of  some  portion  of  the  existing  tree.  They  can  also  produce  editor 
command  files,  to  be  executed  once  the  filter  process  terminates.  The  last  command  in  this  file 
can  invoke  the  filter  process  again,  resulting  in  effect  in  a co-routine.  The  editor  provides  both 
user— defined  command  sequences  and  command  files  to  facilitate  the  use  of  these  programs. 

The  SAGA  editor  contains  a filter  command  which  takes  the  name  of  the  filter  process  as 
an  argument,  and  arranges  to  execute  the  program  as  a sub— process  to  the  editor.  This  com- 
mand automatically  supplies  the  name  of  the  parse  tree  directory  as  the  first  argument  to  the 
program,  and  optionally  supplies  a parse  tree  node  number  as  a second  argument  if  a sub-tree  is 
selected  by  the  user  to  be  passed  to  the  filter  command.  Any  other  arguments  given  to  the  filter 
command  are  passed  along  to  the  filter  process  after  these  initial  arguments.  Thus  the  filter  pro- 
cess is  executed  with  the  following  arguments: 

<filtername>  < parse-tree-directory > /< tree-node >/  / < arga  to  filter  cmd>] 

At  each  node  in  the  tree,  the  appropriate  library  routine  can  be  used  to  retrieve  the  fields  of 
interest  in  the  node.  Should  it  be  desired  to  make  modifications  to  the  tree,  two  approaches  may 
be  used.  To  transform  the  tree,  a text  file  should  be  created  into  which  the  new  text  to  be  insert- 
ed into  the  tree  is  placed.  If  the  filter  command  in  the  editor  is  placed  into  a user-defined  com- 
mand sequence,  then  additional  commands  in  this  sequence  can  cause  the  deletion  of  the  sub-tree 
which  was  passed  to  the  filter  followed  by  the  insertion  of  the  new  text  from  this  file. 

For  more  complex  modifications,  the  filter  process  can  created  a command  file  which  con- 
tains a combination  of  editor  commands  and  input  data.  The  user-defined  command  sequence 
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which  executes  the  filter  command  can  then  invoke  the  editor’s  exec  command  on  the  file  pro- 
duced by  the  filter  process;  commands  in  this  file  will  then  guide  the  modifications  to  be  made. 

6.4.  Demand-Paged  Data  Structures 

Pascal  provides  no  mechanism  to  support  random  access  to  files.  Since  parse  trees  can 
grow  large  and  the  editor  would  like  to  be  able  to  run  with  only  a small  portion  of  the  tree 
memory-resident,  a module  was  written  which  permits  a program  written  in  Berkeley  Pascal  to 
randomly  access  records  in  a file.  The  paging  routine  module  provides  an  interface  by  which  the 
records  in  this  file  can  be  accessed  and  modified.  Only  a small  portion  of  the  file  needs  to  be 
memory  resident  at  any  time;  the  package  implements  a demand-pager  to  move  the  data  in  and 
out  of  memory  as  required.  The  programmer  specifies  a record  to  be  paged  and  provides  a buffer 
(an  array  of  records)  to  contain  a portion  of  the  file  in  memory.  The  routines  in  the  package  can 
also  be  used  to  define  an  interface  to  treat  the  records  as  an  encapsulated  data  type,  and  imple- 
ment additional  access  routines  to  provide  access  to  the  fields  in  the  record  in  an  implementation 
independent  manner. 

The  paging  system  provides  access  to  a potentially  large  file  of  records  through  a possibly 
small  area  of  memory  available  to  a program.  Conceptually,  the  file  may  be  thought  of  as  an  ar- 
ray of  records,  the  first  one  labeled  with  index  1,  and  with  no  upper  bound.  As  higher  and  higher 
indices  are  referenced,  additional  pages  are  added  to  the  file.  The  file  is  limited  in  size  only  by 
UNIX  system  imposed  restrictions  (typically  the  amount  of  free  space  on  the  file  system  contain- 
ing the  file). 

Each  record  in  this  file  can  be  read  or  written  independently  from  all  others  in  the  file,  in 
any  order  whatsoever.  The  programmer  using  the  paging  system  simply  specifies  the  index  of  the 
record  in  the  file  he  wishes  to  access,  and  the  record  will  be  swapped  into  memory  if  not  already 
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present,  and  made  available  to  him.  Figure  6-5  illustrates  both  the  concept  and  the  implementa- 
tion scheme  used  by  the  routines. 

The  records  to  be  paged  can  be  any  size  up  to  but  not  greater  than  the  size  of  the  disk  page 
which  is  swapped  by  the  operating  system.  On  older  systems,  this  size  is  typically  512  bytes, 
although  page  sizes  of  1024,  4096,  and  8192  bytes  are  also  common. 

Since  all  disk  i/o  is  performed  a page  at  a time,  no  record  is  stored  across  two  pages,  since 
this  doubles  the  overhead  to  retrieve  the  record.  So  as  many  records  as  will  fit  onto  a single  page 
are  stored  on  that  page,  and  the  remaining  space  is  left  as  a “hole” , which  is  not  used  by  the  pag- 
ing system. 

The  data  is  stored  in  memory  as  an  array  of  records.  The  user’s  program  must  contain  a 
declaration  of  the  record,  and  a pointer  to  an  array  of  records  to  be  used  as  a buffer  to  contain 
the  pages  of  records  which  will  be  swapped  into  and  out  of  memory  by  the  paging  system.  The 
routines  use  a page  table  and  buffer  table  to  store  the  information  needed  to  manage  the  data. 
This  information  is  hidden  from  the  user,  and  it  is  not  necessary  to  understand  these  structures 
in  order  to  use  the  paging  routines;  these  structures  are  shown  in  Figure  6—5  only  for  complete- 
ness and  the  interest  of  the  reader. 

The  cost  of  these  functions  is  the  increased  overhead  of  a procedure  call  per  record  refer- 
ence. These  routines  are  used  by  the  SAGA  language-oriented  editor  to  manage  the  parse  trees 
which  are  constructed  during  the  editing  process.  This  results  in  faster  response  time  for  large 
programs  since  the  entire  tree  does  not  need  to  be  read  into  memory. 

6.5.  Summary 

This  chapter  has  covered  the  modules  of  the  SAGA  editor.  By  parsing  the  user’s  text  as  it 
is  input,  the  editor  provides  additional  analysis  sooner  than  previously  available,  eliminating  the 


109 


Concept:  (Unbounded)  sequence  of  records 

record:  12345 


n 


Implementation: 


Disk:  File  / (of  Pages  of  Records) 


Figure  6-5:  A Demand-Paged  File,  used  for  the  editor’s  parse  tree  and  string  table.  These  struc- 
tures are  paged  into  memory  on  demand,  permitting  the  editor  to  run  with  only  a small  portion 
of  the  parse  tree  memory  resident  during  a editing  session.  The  paging  module  is  available  for 
use  with  other  Pascal  programs,  and  can  be  used  to  support  any  data  structures  which  can  be 
stored  as  an  array  of  records. 
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need  to  run  a compiler  merely  to  locate  and  repair  syntax  and  semantic  errors,  and  reducing  the 
time  from  coding  to  test.  By  implementing  the  editor’s  command  interpreter  over  an  incremen- 
tal, table-driven,  LR(1)  parser,  it  has  been  possible  to  retain  common  text  editing  commands 
while  augmenting  the  user  interface  with  structure-oriented  commands  which  increase  the  level 
of  abstraction  of  the  user  interface.  This  permits  editing  operations  to  be  specified  in  terms 
closer  to  the  problem  at  hand.  Interfaces  are  provided  to  language-dependent  lexical,  syntactic, 
and  semantic  analysis  modules,  permitting  the  use  of  any  parser  generating  system  which  can 
meet  the  requirements  of  the  interface,  and  permitting  language  implementors  to  use  a formal 
specification  grammar  or  other  notation  with  which  they  are  familiar. 

An  extensible  command  set  permits  customization  of  the  editor,  and  allows  it  to  draw  on 
other  tools  in  the  development  environment  to  perform  additional  analyses  and  operations  for 
the  user  from  within  the  editor.  Through  the  use  of  filter  processes,  the  editor  provides  the  capa- 
bility of  performing  semantic  analysis  in  a separate  process  running  in  parallel  with  the  editor, 
which  should  lessen  delays  in  response  time  when  semantic  processing  is  being  performed.  The 
use  of  a demand-paged  data  structure  to  store  the  parse  tree  permits  use  of  the  editor  with  large 
programs  on  systems  with  limited  available  memory.  In  the  next  chapter,  editor  generation  is 
discussed,  completing  the  presentation  of  the  SAGA  language-oriented  editor. 
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CHAPTER  7 

EDITOR  GENERATION 


The  SAGA  Editor  has  been  designed  to  be  easily  retargetable  to  additional  languages. 
Most  of  the  editor’s  modules  are  language-independent,  and  can  be  used  intact  when  an  editor  is 
produced  for  another  language.  Only  the  lexical,  syntactic,  and  semantic  analysis  modules  need 
to  be  altered,  and  the  extent  of  the  alterations  is  dependent  upon  the  parser-generating  system 
being  used  to  process  the  language  specification. 

The  lexical,  syntactic,  and  semantic  analysis  modules  are  generated  by  or  written  for  use 
with  a specific  parser-generator  facility.  The  generator  program  reads  one  or  more  input  files 
which  contain  formal  descriptions  of  the  language-specific  information.  This  information  con- 
sists of  a formal  description  of  the  syntax  of  the  language  in  the  form  of  a grammar,  information 
about  the  lexical  representations  of  the  tokens  and  semantic  evaluation  information  in  the  form 
of  executable  code  fragments  or  attributed  grammars,  depending  on  the  parser-generator  used. 
The  parser-generator  produces  parse  tables  and  associated  information  which  is  combined  with 
the  parser-generator  dependent  library  routines  and  the  common  editor  object  code  to  produce 
an  editor  for  a particular  language.  Figure  7-1  illustrates  the  generation  of  a SAGA  editor. 

7.1.  The  Mystro  Parser-Generator  System 

The  Mystro  parser-generating  system  [Noonan  and  Collins,  84]  uses  a customized  subrou- 
tine to  perform  the  lexical  analysis,  a formal  BNF-grammar  description  of  the  syntax  of  the 
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Figure  7-1:  SAGA  Editor  Generation 


language,  and  code  fragments  attached  to  the  production  rules  of  the  grammar  to  perform  se- 
mantic evaluations  whenever  a reduction  by  this  rule  is  performed  by  the  parser.  The  lexical 
analysis  is  accomplished  in  the  tokenize  routine  which  consists  of  a case  statement  and  associated 
subroutines  to  scan  the  input  buffer  and  recognize  specific  tokens,  followed  by  code  to  construct  a 
terminal  node  for  this  token  and  append  it  to  a list  to  be  returned  to  the  caller  of  the  routine. 
To  adapt  lexical  analysis  for  another  language,  this  routine  can  be  copied  from  a file  already  in 
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existence  for  another  language  and  edited  to  insert  or  delete  cases  to/from  the  case  statement  to 
reflect  the  different  lexical  classes  that  need  to  be  recognized  for  the  new  language.  All  of  the 
code  to  scan  the  input  buffer  and  construct  the  terminal  nodes  can  be  re-used  unchanged.  If  the 
lexical  structure  for  the  new  language  is  similar  to  that  of  a language  which  has  already  been 
specified  for  a SAGA  editor,  then  the  modifications  are  straightforward  and  take  little  time. 

At  the  syntax  analysis  level,  a formal  BNF  grammar  must  be  specified  which  is  LR(1);  the 
challenge  to  the  language  implementor  is  to  get  the  specification  into  this  form,  eliminating  all 
shift/reduce  and  reduce/reduce  conflicts.  Unfortunately,  at  this  time  the  Mystro  system  does  not 
permit  operator  precedence  specification  and  ambiguous  grammars,  so  it  is  necessary  to  com- 
pletely specify  the  precedence  of  operators  in  the  structure  of  the  production  rules  and  hence  the 
parse  tree.  For  a language  like  Pascal,  this  is  not  too  difficult,  since  there  are  a limited  number  of 
precedence  levels.  However,  for  a language  such  as  C,  there  are  so  many  precedence  levels  that 
the  parse  trees  become  heavy  with  renaming  rules  in  the  sections  involving  operators  and 
operands. 

The  Mystro  system  will  still  produce  a file  of  parse  tables  for  the  syntax  of  ambiguous 
grammars,  though  the  parser  will  always  default  to  using  the  first  applicable  action  which  it  en- 
counters. But  because  the  tables  are  produced,  it  is  possible  to  post-process  them  manually,  and 
in  many  cases  edit  the  tables  and  resolve  the  conflicts  in  favor  of  one  or  another.  In  the  case  of 
the  Pascal  grammar,  it  is  possible  to  replace  the  production  rules  given  in  Figure  7-2(a)  with 
those  in  7-2(b),  run  the  resulting  ambiguous  grammar  through  the  parser-generator,  and  then 
edit  the  resulting  tables.  The  effect  is  that  all  renaming  rules  of  the  form: 

<simple-expression>  — ► <term>  — ► <factor>  — ► <id> 
disappear  from  the  parse  tree  and  are  replaced  by: 

< simple- expression > — ► <id> 


c-  3 
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directly.  Parse  trees  produced  by  editors  for  both  of  these  grammars  were  analyzed  for  produc- 
tion rule  frequency  and  tree  size  and  it  was  found  that  the  trees  resulting  from  the  ambiguous 
grammar  contained  27%  fewer  nodes,  a significant  saving  in  both  space  and  parser  processing 
time. 

The  parser-generator  takes  the  code  fragments  associated  with  the  production  rules  in  the 
grammar  specification  and  combines  them  into  a case  statement  indexed  by  rule  number;  this 
statement  is  placed  into  one  of  the  language-dependent  analysis  files  automatically  during  parser 
generation.  The  SAGA  editor  provides  support  for  semantic  analysis  to  be  performed  either  in- 
sequence with  the  parse,  or  after  the  syntax  analysis  has  completed.  In  this  latter  case,  a 
separate  process  can  be  employed  to  perform  the  semantic  analysis.  The  use  of  a separate  pro- 
cess is  encouraged  since  the  semantic  analysis  must  presently  be  done  with  code  fragments,  but  if 
the  lexical  analysis  could  be  made  table-driven,  then  it  would  be  possible  to  produce  a single  edi- 
tor which  loads  the  lexical  and  syntax  tables  at  run  time,  instead  of  customized  editors,  instan- 


< simple  jtxpres  > 
< add_op > 
<term>  — 

<mult_op> 
<factor>  —) 


-►  < simple _jexpres>  <add_op>  <term>  ] <term> 
+ \-  \or 

<term>  <mult_op>  <factor>  \ <factor> 

• * | / j div  j mod  | and 
< variable  > 


(a)  Section  of  original  unambiguous  grammar 


< simple  _txpres>  — ► <simple_expres>  <op>  < variable > | < variable  > 
<op>  — ► + J - | or  | * ] / ] div  j mod  \ and 

(b)  Equivalent  ambiguous  grammar,  assuming  +,  - and  or 
are  assigned  lower  precedence  than  the  remaining  operators. 


Figure  7-2:  A grammar  simplification  resulting  in  more  efficient  parse  trees. 


115 


tiated  one  per  language.  To  produce  an  editor  for  a new  language,  only  new  tables  would  need  to 
be  produced;  these  could  be  used  with  an  existing  editor  binary,  saving  storage  space  used  for  the 
editor  programs  and  permitting  all  SAGA  editors  to  run  from  a single  text  image  in  memory. 

The  SAGA  group  has  found  the  Mystro  parser-generator  to  be  a stable  and  reliable  system, 
and  of  great  use  in  the  development  of  new  SAGA  editors.  If  a future  version  could  contain  a 
formal  specification  of  lexical  classes  then  the  manual  code  modification  of  the  tokenizing  routine 
could  be  eliminated;  if  ambiguous  grammars  with  precedence  specification  of  tokens  which  arise 
in  ambiguous  constructs  could  be  provided,  grammars  could  be  specified  which  produce  potential- 
ly much  more  efficient  parse  trees.  These  extensions  could  enhance  a very  useful  system. 

7.2.  The  ILLIPSE  Parser-Generating  System 

Over  the  past  few  years,  work  has  been  underway  at  the  University  of  Illinois  on  an  in- 
teractive parser-generator  system  [Mickunas,  81],  [Mickunas,  86].  The  ILLInois  Parsing  System 
Editor  (ILLIPSE)  permits  a user  to  build,  examine,  modify  and  test  context-free  grammars  in- 
teractively. A BNF-style  format  is  used  to  specify  the  grammar  to  be . processed.  The  user 
selects  the  type  of  parser  to  be  generated;  LR(1),  LALR(l),  SLR(l),  and  NSLR(l)1  parsers  are 
supported.  The  user  then  controls  the  generation  of  the  sets  of  items  for  the  parse  states  of  the 
parser.  States  can  be  generated  singly,  or  all  at  once.  The  user  then  can  traverse  the  state  tables 
by  state  number  or  transition,  adding  and  deleting  items,  lookaheads,  and  transitions.  Test 
strings  can  be  input  and  parsed  to  check  the  behavior  of  the  parser. 

ILLIPSE  is  a very  useful  tool  for  the  specification  of  context-free  grammars.  Ambiguous 
grammars  can  be  input,  and  the  ambiguities  resolved  interactively.  Many  renaming  rules2  in  the 
grammar  can  be  eliminated  which  results  in  smaller  grammars,  parse  tables,  and  resulting  parse 
'Non-deterministic  SLR. 

2Renaming  rules  are  production  rules  containing  a single  non-terminal  on  the  right  hand  side. 
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trees  than  otherwise  would  be  possible  if  an  unambiguous  grammar  without  token  precedence 
specification  needed  to  be  used.  This  ability  can  greatly  simplify  the  task  of  grammar  prepara- 
tion, and  result  in  production  rules  which  more  closely  match  the  constructs  in  the  language. 

7.3.  The  Olorin  Parser-Generator 

Work  has  begun  in  the  SAGA  group  to  produce  a parser-generator  which  takes  a formal 
language  specification  in  an  extended  BNF  syntax,  with  support  for  formally  specified,  incremen- 
tally evaluatable  semantics  [Beshers,  84],  [Beshers  and  Campbell,  85].  A prototype  generator  is 
still  in  the  design  and  implementation  phase,  but  should  be  available  for  testing  some  time  within 
the  next  year. 

7.4.  Other  Parser-Generators 

As  already  mentioned,  the  SAGA  editor  can  be  used  with  any  parser-generating  system 
which  can  produce  tables  for  which  code  can  be  written  to  meet  the  requirements  of  the  lexical, 
syntactic,  and  semantic  interfaces  discussed  in  the  previous  chapter.  Other  logical  generators  to 
use  are  the  lex  and  yacc  programs  available  on  UNIX  systems  [Lesk,  75],  [Johnson,  75].  These 
programs  need  some  modification  since  they  were  designed  as  an  encapsulated  black-box  lexer 
and  parser,  which  perform  more  work  than  is  appropriate  when  applied  to  the  SAGA  editor. 
Yacc  both  performs  the  syntax  analysis  and  provides  parsed  output,  but  the  SAGA  editor  needs 
structures  which  can  be  incrementally  reparsed  at  a later  time;  only  the  syntax  tables  provided 
are  usable  since  the  editor  produces  its  own  parser  output  (the  parse  tree). 

7.5.  Summary 

By  designing  the  editor  to  be  retargetable,  the  results  of  the  effort  that  went  into  producing 
a language-oriented  editor  can  be  applied  more  widely.  This  greatly  reduces  the  time  and  effort 
required  to  produce  an  editor  for  a new  language.  It  produces  software  modules  which  may  be 
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re-used  both  together  and  separately,  in  related  and  unrelated  programs  as  well. 

The  editor's  modular  structure  permits,  for  example,  the  reuse  of  only  the  parser  and  sub- 
ordinate modules  in  programs  which  need  to  manipulate  parse  trees  automatically  under  pro- 
gram control;  while  such  a program  could  feed  a SAGA  editor  input  through  a pseudo-teletype 
interface,  it  will  be  more  efficient  to  produce  a single  program  which  can  communicate  directly 
with  the  tokenize  and  parse  routines.  Other  modules,  such  as  the  demand— pager  for.  arrays  of 
records,  can  find  uses  in  unrelated  applications  in  which  a large  amount  of  data  can  be  accessed 
non-sequentially  and  processed  in  small  pieces. 

By  permitting  the  interfacing  of  other  parser-generating  systems,  the  SAGA  editor  can 
take  advantage  of  new  systems  which  come  along,  and  which  may  provide  better  support  for  a 
particular  language  than  a generator  which  is  currently  in  use.  The  use  of  modular,  re-usable 
software  enhances  the  software  development  environment,  adding  power  and  flexibility  to  the 
tasks  of  efficient  software  development. 
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CHAPTER  8 
CONCLUSION 


This  dissertation  has  shown  that  a language-oriented  editor  for  context-free  languages  can 
be  based  upon  an  incremental  LR(1)  parser  with  incremental  analysis  techniques.  The  editor  has 
been  constructed  using  the  recognition  approach,  which  permits  it  to  retains  common  text  editing 
commands  while  augmenting  them  with  structure-oriented  ones.  It  can  handle  full  programming 
languages.  It  is  superior  to  editors  based  on  the  generator  approach,  which  implement  subsets  of 
full  programming  languages  and  provide  restricted  editing  environments,  and  are  unable  to  pro- 
vide many  of  the  operations  currently  available  in  text  editors. 

The  editor  incorporates  a table-driven,  incremental  parser.  The  parser  provides  an  en- 
vironment in  which  syntactic  errors  are  permitted;  editing  is  simplified  since  structural 
modifications  which  can  be  tedious  can  be  performed  in  several  pieces.  The  program  being  edited 
can  be  taken  through  several  intermediate,  incorrect  states.  Since  the  parser  permits  the  editor 
to  support  text-oriented  commands,  pre-existing  code  fragments  in  text  form  can  be  directly  in- 
corporated anywhere  in  the  parse  tree;  no  preprocessing  is  required. 

We  have  presented  our  parse  tree  node  structure,  which  adds  attributes  which  are  of  direct 
benefit  to  an  editor.  These  attributes  permit  the  parse  tree  to  be  used  directly  by  the  editor  s 
command  interpreter  and  display  module.  This  eliminates  the  need  to  keep  an  additional  text 
representation,  and  the  additional  complexity  that  would  be  required  to  maintain  consistency 
between  the  textual  and  structural  forms  of  the  data.  Since  a single  data  structure  suffices  for 
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the  parser  and  editor,  no  unparser  is  needed  to  retrieve  the  original  syntax  and  formatting  infor- 
mation. 

A new  solution  to  the  handling  of  comments  in  syntax  trees  has  been  presented,  which  elim- 
inates the  restrictions  placed  upon  comments  by  syntax-directed  template  editors,  simplifies 
storage  and  maintenance  of  comments  in  the  parse  tree,  and  supports  uniformity  of  access  by  ed- 
itor commands  which  can  reference  both  comments  and  syntactically  meaningful  tokens  in  the 
parse  tree  at  the  same  time. 

In  the  parsing  algorithm,  we  have  redefined  the  reduce  operation,  proposing  an  alternative 
which  permits  the  parser  to  treat  non-terminal  and  terminal  tokens  uniformly,  permitting  the 
specification  of  non-terminals  in  the  input  string.  We  have  combined  the  parsing  action  and  goto 
function  into  a single  action.  Both  of  these  modifications  eliminate  duplicate  code  in  the  incre- 
mental parser,  and  improve  its  efficiency. 

Explicit  error  handling  actions  have  been  introduced,  since  a working  editor  must  be  able 
to  recover  from  a user’s  syntax  errors.  The  error-recovery  algorithm  handles  multiple  syntax  er- 
rors, and  permits  editing  of  the  parse  tree  in  the  midst  of  errors. 

The  editor  is  screen— oriented:  It  displays  the  parse  tree  terminal  nodes  in  text  form,  no 
non— terminal  nodes  appear  on  the  screen,  so  that  the  programmer  need  not  know  the  specific 
construction  of  the  production  rules  in  the  grammar  defining  the  language  in  order  to  be  able  to 
use  the  editor.  A command  is  provided  to  display  the  set  of  legal  tokens  which  can  appear  at  a 
given  location  in  the  tree.  This  feature  can  aid  programmers  who  are  learning  a new  language, 
as  well  as  provide  diagnostic  support  to  aid  in  the  repair  of  difficult  syntax  errors. 

The  editor  is  flexible  and  supports  a higher-level  command  interface  which  includes  both 
structure-oriented  commands  and  common  text  editing  commands.  This  editor  can  be  used  to 
develop  practical  programs  which  incorporate  software  engineering  principles  concerning  the 
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design  and  construction  of  software  systems.  A prototype  editor  which  employs  these  algorithms 
was  implemented  beginning  in  1981  as  a demonstration  of  the  practicality  and  flexibility  of  this 
approach;  this  editor  has  been  in  experimental  use  over  the  past  couple  of  years. 

The  editor  is  a part  of  the  SAGA  system,  which  is  directed  towards  experienced  program- 
mers, who  if  anything  need  additional  editing  flexibility  and  analysis,  and  not  a tightly  con- 
strained environment  with  restrictive  editing  options. 

The  editor’s  modular  structure  supports  the  reuse  of  code  when  constructing  editors  for 
other  languages,  making  the  majority  of  its  code  language-independent.  By  basing  it  upon  stan- 
dard table-driven  LR  parser  technology,  the  editor  can  be  used  with  many  of  the  already  existing 
parser-generator  programs  which  have  been  independently  developed,  improving  its  applicability. 

In  summary,  the  construction  of  a language-oriented  editor  based  upon  the  recognition  ap- 
proach is  very  flexible  and  has  several  advantages: 


The  technique  can  be  applied  consistently  to  the  lexical,  syntactic,  and  semantic 
components  of  the  language.  (Many  language-oriented  editors  based  on  a genera- 
tion approach  nevertheless  depend  upon  the  recognition  of  valid  primitive  expres- 
sions of  the  language.)  We  believe  this  consistency  simplifies  the  implementation  of 
a uniform  set  of  basic  editing  commands  such  as  insert,  delete,  move  and  copy  for 
the  lexical,  syntactic  and  semantic  components  of  the  language. 

2)  The  approach  permits  arbitrary  editing  operations  on  the  program.  Editors  that 
use  the  generation  approach  cannot  permit  arbitrary  changes  and  often  require  par- 
ticular syntactic  transformations  to  be  implemented  as  special  cases. 

3)  The  approach  facilitates  program  maintenance  and  modification.  It  is  often 
simpler  to  transform  an  existing  program  into  a desired  program  if  the  editing 
commands  can  take  a program  through  a sequence  of  intermediate  invalid  forms. 
In  addition,  these  invalid  programs  may  be  saved  between  editing  sessions.  Such 
program  transformations  are  difficult  to  implement  using  an  editor  based  on  the 
generator  approach. 

4)  Arbitrary  lines  of  existing  program  text  can  be  inserted  anywhere  into  the  text  of 
a new  program.  This  allows  the  editor  to  be  used  to  combine  two  different  versions 
of  a program  in  an  arbitrary  manner  to  produce  a new  version. 
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5)  The  design  of  a facility  to  produce  language-oriented  editors  is  simplified  if  exist- 
ing compiler  generation  and  parsing  techniques  and  tools  can  be  employed  without 
major  alteration.  If  standard  compiler  generation  and  parsing  techniques  are  used, 
then  many  existing  specifications  of  the  lexical,  syntactic,  and  semantic  components 
of  a programming  language  can  be  used  directly  by  the  facility  to  produce 
corresponding  language-oriented  editors. 

The  editor  runs  on  a DEC  VAX  11/780  under  the  4.2BSD  UNIX  operating  system.  Editors 
have  been  created  for  Pascal,  C,  Ada,  and  FP.  Most  experimentation  has  involved  the  Pascal  ed- 
itor, and  we  have  found  that  enough  additional  processing  is  performed  that  a fast  or  dedicated 

ft 

processor  is  necessary  to  provide  reasonable  response  times,  but  that  with  such  a processor,  the 
apparent  response  time  perceived  by  the  user  is  as  good  as  with  a text  editor.  The  parse  trees  for 
Pascal,  using  a non-ambiguous  grammar,  take  about  ten  times  as  much  space  as  the  equivalent 
text  representation.  Using  an  ambiguous  grammar,  and  eliminating  renaming  rules  in  expres- 
sions, we  have  found  that  we  can  reduce  the  size  of  the  tree  to  seven  times  that  required  for  the 
text.  Additional  semantic  information  will  increase  this  size  somewhat. 

Since  a dedicated  processor  is  desirable,  the  editor  has  been  ported  to  a workstation  en- 
vironment. It  runs  on  a Sun  workstation  under  the  4.2BSD  UNIX  operating  system.  We  have 
found  that  a workstation  provides  an  ideal  environment  for  such  an  editor,  since  the  processing  is 
adequate  for  its  needs  and  the  large  amount  of  available  memory  permits  efficient  editing  of  large 
programs.  Response  time  is  adequate,  and  the  multi-process  window  environment  provided  by 
the  system  software  promotes  good  interaction  between  the  editor  and  other  tools  used  in  a 
software  development  environment. 

Looking  beyond  the  editor  into  the  development  environment  in  which  it  beginning  to  be 
run,  the  parse  trees  produced  by  the  editor  can  serve  as  a uniform  data  structure  for  many  other 
tools.  Additional  programs  can  easily  be  written  to  perform  additional  analyses  or  operations  on 
these  parse  trees.  Editors  can  be  produced  for  specification  and  design  languages,  and  tools  writ- 
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ten  to  transform  parse  trees  produced  by  one  level  into  a form  suitable  for  the  next.  Applied  in 
an  integrated  modular  environment,  the  editor  can  take  advantage  of  dependency  information  to 
generate  displays,  noting  the  interrelationships  among  components  in  a system  under  develop- 
ment; research  into  such  an  environment  has  been  performed  (Kirslis  et  al.,  85],  and  is  continuing 
[Terwilliger  and  Campbell,  86]. 

The  editor  has  also  been  used  to  support  the  research  in  several  Master’s  Theses,  one  cover- 
ing analysis  of  changes  to  semantic  scopes  [Badger,  84],  another  implementing  a symbol  table  for 
use  with  the  editor  [Richards,  84],  and  a third  interfacing  the  editor’s  parse  tree  to  a code  genera- 
tor [Kimball,  85].  It  has  been  used  to  support  the  development  of  software  tools,  written  as  class 
projects  for  software  engineering  classes  offered  by  the  Department  of  Computer  Science  at  the 
University  of  Illinois.  Among  the  projects  were  a tree  transformation  tool  for  Pascal  and  a pro- 
gram sheer  for  data  flow  analysis.  The  editor  is  being  extended  to  include  semantic  analysis 
[Beshers,  84],  a table  driven  lexical  analysis  based  on  lex,  and  a table-driven  command  inter- 
preter for  the  editor  that  will  permit  formal  specification  of  the  editor’s  command  language. 

The  research  into  an  editor  based  on  the  recognition  approach  has  shown  this  approach  to 
be  feasible,  and  through  our  initial  experiments  with  it,  we  believe  that  the  prototype  editor  im- 
plemented with  this  approach  is  refinable  into  a practical  tool  which  will  better  support  program- 
mers and  enhance  the  software  development  process. 
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APPENDIX  A 

LALR(l)  GRAMMARS 


The  LALR(l)  grammars  used  to  produce  SAGA  editors  for  the  Ada,  FP,  and  Pascal  pro- 
gramming languages  are  collected  here.  The  grammars  are  presented  as  they  appear  in  the  list- 
ing file  produced  by  the  Mystro  parser-generator.  Some  additional  statistics  about  the  parser 
generated  by  Mystro  are  also  presented. 

The  grammar  for  Ada  [ARM,  83]  is  based  upon  [Wetherell,  81],  with  some  corrections.  We 
have  not  yet  tested  our  editor  against  the  validation  suite  supplied  by  the  Ada  Joint  Program 
Office,  but  we  have  run  numerous  tests,  all  of  which  the  grammar  has  successfully  passed.  The 
grammar  for  the  Functional  Programming  Language  [Backus,  78]  is  based  upon  the  4.2BSD 
UNIX  implementation  [Baden,  83].  The  Pascal  grammar  is  based  upon  the  description  in  [Jensen 
and  Wirth,  74]  and  revised  to  include  specific  constructs  which  are  permitted  by  the  Berkeley  4.2 
Pascal  compiler. 

For  lexical  analysis,  the  Mystro  parser-generator  requires  Pascal  code  fragments  to  be  writ- 
ten which  recognize  the  generic  classes  of  the  terminal  tokens  of  the  language.  (The  reserved 
words,  operators,  and  punctuation  are  collected  and  put  into  a table  by  the  parser-generator.) 
These  fragments  are  included  in  a lexical  analysis  module.  Since  the  lexical  classes  of  each  of 
these  languages  are  readily  available  in  user’s  manuals  for  the  languages,  the  code  fragments  giv- 
ing the  lexical  specifications  have  been  omitted  here  to  conserve  space.  In  the  grammars  present- 
ed here,  these  generic  lexical  classes  are  represented  by  non-terminal  tokens  of  the  form 
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<class...>,  which  appear  on  the  right  hand  sides  of  productions  and  have  no  left  hand  side 
definition. 

Mystro  permits  Pascal  code  fragments  which  perform  semantic  actions  to  accompany  the 
production  rules  of  the  grammar.  These  fragments  are  collected  into  a case  statement  which  is 
indexed  by  the  production  rule  number.  This  case  statement  is  included  in  the  SAGA  editor,  and 
is  executed  each  time  a reduction  is  performed  during  the  parse.  No  semantic  actions  are  shown 
with  the  grammars  presented  here. 

The  binary  parse  tables  for  Ada  take  about  20k  bytes  of  storage;  for  FP,  about  4k  bytes; 
and  for  Pascal,  about  8k  bytes.  These  figures  include  storage  for  the  text  names  of  the  non- 
terminal tokens  in  the  grammar. 

When  a grammar  is  analyzed,  Mystro  produces  some  additional  statistics  about  the  parser. 
These  statistics  are  presented  below,  followed  by  the  three  grammars. 


Ada  Parser  Statistics 

A total  of  432  rules  containing  292  symbols  were  read  from  "Ada.g." 

470  states  and  10014  items  have  been  constructed.  Compute  slr(l)  follow  set. 
15  collisions  are  not  slr(l)-resolvable,  but  all  states  are  at  least  lalr(l). 

6470  actions  constructed  for  the  actions  array. 


FP  Parser  Statistics 

A total  of  100  rules  containing  97  symbols  were  read  from  "FP.g." 

31  states  and  1092  items  have  been  constructed.  Compute  slr(l)  follow  set. 
1 collision  is  not  slr(l)-resolvable,  but  all  states  are  at  least  lalr(l). 

1029  actions  constructed  for  the  actions  array. 


Pascal  Parser  Statistics 

A total  of  217  rules  containing  166  symbols  were  read  from  "Pascal.g." 

209  states  and  2537  items  have  been  constructed.  Compute  slr(l)  follow  set. 
3 collisions  are  not  slr(l)-resolvable,  but  all  states  are  at  least  lalr(l). 

1929  actions  constructed  for  the  actions  array. 


129 


Mystro  Translator  Writing  System  Version  7.0,  June  1983 

Ada  grammar  Page  1 

Input  grammar.  Grammar  option.  Default:  on 

The  goal  symbol  <system_goal_symbol>  Is  found  In  rule  1. 


[ 1]  <system_goal_symbol> 

[ 2]  <compllatlon_eof > 

[ 3]  <compilatlon_eof > 

[ 4]  <compllatlon> 

[ 5]  <compilatlon> 

[ 6]  <compilatlon_unit> 

[ 7]  <compllatlon_unit> 

[ 8]  <compilatlon_unit> 

[ 9]  <compilatlon_unlt> 

[ 10]  <compilation_unit> 

[ 11]  <context_spec> 

[ 12]  <with_use_list> 

[ 13]  <with_use_list> 

[ 14]  <wlth_use_list> 

[ 15]  <with_use_llst> 

[ 16]  <wlth_clause> 

[ 17]  <unlt_name_llst> 

[ 18]  <unit_name_llst> 

[ 19]  <pragma> 

C 20]  <pragma> 

[ 21]  <use_clause> 

[ 22]  <pkg_name_llst> 

[ 23]  <pkgjname_list> 

[ 24]  <subpgm_decl> 

[ 25]  <subpgm_decl> 

[ 26]  <subpgm_decl> 

[ 27]  <subpgra_spec> 

[ 28]  <subpgm_spec> 

[ 29]  <subpgm_spec> 

[ 30]  <subpgm_spec> 

[ 31]  <designator> 

[ 32]  <designator> 

[ 33]  <frml_part> 

[ 34]  <parm_decl_list> 

[ 35]  <parm_decl_llst> 

[ 36]  <parm_decl> 

[ 37]  <parm_decl> 

[ 38]  <mode> 

[ 39]  <mode> 

[ 40]  <mode> 

[ 41]  <mode> 

[ 42]  <subpgm_body> 


<compilation_eof > 

<compllatlon>  <eof> 

<eof  > 

<compilation_unlt> 

<compllatlon>  <compilation_unlt> 
<context_spec>  <subpgm_decl> 

<context_spec>  <subpgm_body> 

<context_spec>  <pkg_decl> 

<context_spec>  <pkg_body> 

<context_spec>  <subunlt> 

<wlth_use_llst> 

<wlth_use_list>  <with_clause> 
<wlth_use_llst>  <with_clause>  <use_clause> 
<wlth_use_list>  <pragma> 
with  <unlt_name_list>  ; 

<unit_name> 

<unit_name_list>  , <unit_name> 
pragma  <ldentlfler>  ; 
pragma  <identlfler>  <arg_llst>  ; 
use  <pkg_name_list>  ; 

<pkg_name> 

<pkg_name_llst>  , <pkg_name> 

<subpgm_spec>  ; 

<gnrc_subpgm_decl> 

<gnrc_subpgm_lnst> 

procedure  <ldentlfler> 

procedure  <ldentlfler>  <frml_part> 

function  <deslgnator>  return  <subtype_ind> 

function  <deslgnator>  <frml_part>  return 

<subtype_lnd> 

<identif ler> 

<op_8ymbol> 

( <parm_decl_llst>  ) 

<parm_decl> 

<parm_decl_list>  ; <parm_decl> 

<ldentif ier_list>  : <mode>  <subtype_lnd> 
<ldentlf ler_list>  : <mode>  <subtype_lnd>  := 
<expr> 

in 

out 

In  out 

<subpgm_spec>  Is  <decl_part>  begin 
<seq_of_stmts>  <excepts_opt>  end 
<designator_opt>  ; 
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[ 43]  <excepts_opt> 

[ 44]  <excepts_opt> 

[ 45]  <except_hand_list> 

[ 46]  <except_hand_list> 

[ 47]  <deslgnator_opt> 

[ 48]  <deslgnator_opt> 

[ 49]  <pkg_decl> 

[ 50]  <pkg_decl> 

[ 51]  <pkg_decl> 

[ 52]  <pkg_spec> 

[ 53]  <pkg_spec> 

[ 54]  <pkg_body> 

[ 55]  <pkg_body> 

[ 56]  <decl_ltem_llst> 

[ 57]  <decl_ltem_llst> 

[ 58]  <prlvate_part_opt> 

[ 59]  <prlvate_part_opt> 

[ 60]  <repr_spec_llst> 

[ 61]  <repr_spec_llst> 

[ 62]  <repr_spec_llst> 

[ 63]  <pkg_body_part_opt> 
[ 64]  <pkg_body_part_opt> 
[ 65]  <subunit> 

[ 66]  <body_stub> 

[ 67]  <body_stub> 

[ 68]  <body_stub> 

[ 69]  <decl_part> 

[ 70]  <decl_part> 

[71]  <decl_part> 

[ 72]  <decl_ltem> 

[ 73]  <decl_ltem> 

[ 74]  <decl_item> 

[ 75]  <decl_ltem> 

[ 76]  <pgm_comp> 

[ 77]  <pgm_comp> 

[ 78]  <body> 

[ 79]  <body> 

[ 80]  <body> 

[ 81]  <task_decl> 

[ 82]  <task_spec> 

[ 83]  <task_spec> 

[ 84]  <task_spec> 

[ 85]  <task_spec> 

[ 86]  <task_spec_part> 

[ 87]  <task_spec_part> 


: :=  exception  <except_hand_list> 

: :=  <except_handler> 

: :=  <except_hand_llst>  <except_handler> 

: :=  <designator> 

: :=  <pkg_spec>  ; 

::=  <gnrc_pkg_decl> 

: :=  <gnrc_pkg_lnst> 

: :=  package  <ldentifier>  Is  <decl_item_list> 
<prlvate_part_opt>  end 

: :=  package  <ldentl£ler>  Is  <decl_ltem_llst> 
<prlvate_part_opt>  end  < identified 
: :=  package  body  <identlfier>  is  <decl_part> 
<pkg_body_part_opt>  end  ; 

: :=  package  body  <identifier>  is  <decl_part> 
<pkg_body_part_opt>  end  <ldentlfler>  ; 

: :=  <decl_item>  <decl_ltem_llst> 

: :=  private  <decl_item_list> 

: :=  <repr_spec_list>  <pragma> 

: :=  <repr_spec_list>  <repr_spec> 


: :=  begin  <seq_of_stmts>  <excepts_opt> 

::=  separate  ( <unit_name>  ) <body> 

: :=  <subpgm_spec>  is  separate  ; 

::=  package  body  <ldentlfier>  is  separate  ; 

::=  task  body  <identlfier>  is  separate  ; 

: :=  <decl_part>  <decl_item> 

: :=  <decl_part>  <pgm_comp> 

: :=  <decl> 

: :=  <repr_spec> 

: :=  <use_clause> 

: :'=  <pragma> 

: :=  <body> 

: :=  <body_stub> 

: :=  <subpgm_body> 

: :=  <pkg_body> 

: :=  <task_body> 

: :=  <task_spec> 

: :=  task  < Identified  ; 

::=  task  type  identified  ; 

::=  task  identified  <task_spec_part>  ; 

::=  task  type  identified  <task_spec_part>  ; 

: :=  is  <entry_decl_list>  <repr_spec_list>  end 
::=  is  <entry_decl_list>  <repr_spec_list>  end 
identified 
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[ 88] 

<entry_decl_list> 

: = 

[ 89] 

<entry_decl_list> 

: = 

<entry_decl_list>  <entry_decl> 

[ 90] 

<task_body> 

• ~ 

task  body  <identifier>  is  <decl_part>  begin 
<seq_of_stmts>  <excepts  opt>  end  ; 

[ 91] 

<task_body> 

* 

task  body  <identifier>  is  <decl  part>  begin 
<seq_of_stmts>  <excepts_opt>  end  <identifier>  ; 

[ 92] 

<decl> 

; = 

<obJ  ect_decl> 

[ 93] 

<decl> 

: = 

<type_decl> 

C 94] 

<decl> 

: = 

<subpgm_decl> 

[ 95] 

<decl> 

: = 

<task  decl> 

[ 96] 

<decl> 

: = 

<renaming  decl> 

[ 97] 

<decl> 

: = 

cnumber  decl> 

[ 98] 

<decl> 

: = 

<subtype  decl> 

[ 99] 

<decl> 

: = 

<pkg  decl> 

[100] 

<decl> 

: = 

<except  decl> 

[101] 

<ob]ect  decl> 

: = 

cidentif ier_list>  : <subtype  ind>  <init  opt>  ; 

[102] 

<obJect_decl> 

: = 

<ldentif ier_list>  : <array_type_def > <init_opt> 

[103] 

<object  decl> 

: = 

t 

<identif ier_list>  : constant  <subtype_ind> 
<init  opt>  ; 

[104] 

<ob]ect  decl> 

• - 

<identif ier_list>  : constant  <array  type  def> 
<init_opt>  ; 

[105] 

<init_opt> 

: = 

[106] 

<init_opt> 

: = 

:=  <expr> 

[107] 

<number_decl> 

j — 

<identlf ier_list>  : constant  :=  <literal  expr> 

[108] 

<ldentif ier_list> 

• — 

* 

<identif ier> 

[109] 

<identif ier_list> 

: = 

<identif ier_list>  # <identifier> 

[110] 

<type_decl> 

• — 

type  <ldentifier>  <discr_part  opt>  is 
<type  def>  ; 

[111] 

<type_decl> 

: = 

<incompl  type  decl> 

[112] 

<discr  part  opt> 

: = 

[113] 

<discr  part_opt> 

: = 

<discr  part> 

[114] 

<type  def> 

* = 

<enum_type  def> 

[115] 

<type  def> 

: = 

<real_type  def> 

[116] 

<typejief> 

: = 

<record  type  def> 

[117] 

<type_def> 

: = 

<derived_type_def > 

[118] 

<type_def> 

: = 

<integer  type_def> 

[119] 

<type_def > 

: = 

<array_type_def> 

[120] 

<type_def > 

: = 

<access  type  def> 

[121] 

<type  def> 

: = 

<private_type_def > 

[122] 

<subtype_decl> 

• = 

subtype  <identifier>  is  <subtype  ind>  ; 

[123] 

<subtype_ind> 

: = 

<name> 

[124] 

<subtype_ind> 

: = 

<name>  <range_constr> 

[125] 

<subtype_ind> 

: = 

<name>  <accuracy  constr> 

[126] 

<derived_type_def > 

: = 

new  <subtype_ind> 

[127] 

<range  constr> 

; = 

range  <range> 

[128] 

<range> 

; = 

<simple_expr>  . . <simple_expr> 

[129] 

<enum_type_def > 

: = 

( <enum_literal_list>  ) 

[130] 

<enum  literal  list> 

: = 

<enura  literal> 
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[131]  <enum_llteral_llst> 

[132]  <enum_literal> 

[133]  <enum_literal> 

[134]  <integer_type_def > 

[135]  <real_type_def > 
[138]  <accuracy_constr> 

[137]  <accuracy_constr> 

[138]  <£loat_pt_constr> 

[139]  <f loat_pt_constr> 

[140]  <f iied_pt_constr> 

[141]  <f lied_pt_constr> 

[142]  < array _type_def> 

[143]  < arr ay_typ e_de f > 

[144]  <index_list> 

[145]  <index_llst> 

[146]  <lndex> 

[147]  <dlscrete_range> 

[148]  <dlscrete_range> 

[149]  <discrete_range> 

[150]  <record_type_def > 

[151]  <comp_llst> 

[152]  <comp_list> 

[153]  <comp_list> 

[154]  <corap_decl_llst> 

[155]  <comp_decl_llst> 
[158]  <comp_decl> 

[157]  <comp_decl> 


= <enum_llteral_list>  , <enum_literal> 

= <ldentl£ier> 

= <character> 

= <range_constr> 

= <accuracy_constr> 

= <f loat_pt_constr> 

= <f lxed_pt_constr> 

= digits  <statlc_slmple_expr> 

= digits  <statlc_slmple_expr>  <range_constr> 

= delta  <statlc_slmple_expr> 

= delta  <static_simple_expr>  <range_constr> 

= array  ( <index_list>  ) of  <comp_subtype_ind> 

= array  <arg_list>  of  <comp_subtype_ind> 

= <index> 

= <index_llst>  , <lndex> 

= <name>  range  <> 

= <name> 

= <name>  <range_constr> 

= <range> 

= record  <comp_llst>  end  record 
= <comp_decl_list> 

= <comp_decl_llst>  <variant_part> 

= null  ; 

= <comp_decl_list>  <comp_decl> 

= <ldentlfler_llst>  : <subtype_ind>  <inlt_opt>  ; 

= <ldentlf ier_llst>  : <array_type_def > <lnlt_opt> 


[158]  <dlscr_part> 

[159]  <dlscr_decl_llst> 

[160]  <dlscr_decl_llst> 

[161]  <discr_decl> 

[162]  <varlant_part> 

[163]  <varlant_elt_list> 

[164]  <varlant_elt_list> 

[165]  <cholce_llst> 

[188]  <cholce_list> 

[167]  <cholce> 

[168]  <cholce> 

[169]  <cholce> 

[170]  <cholce> 

[171]  <access_type_def > 

[172]  <lncompl_type_decl> 

[173]  <incompl_type_decl> 

[174]  <expr> 

[175]  <expr> 

[176]  <expr> 

[177]  <expr> 

[178]  <expr> 


= ( <dlscr_decl_llst>  ) 

= <discr_decl> 

= <dlscr_decl_llst>  ; <dlscr_decl> 

= <identlf ier_llst>  : <subtype_lnd>  <lnit_opt> 
= case  <name>  Is  <variant_elt_list>  end  case  ; 

= <varlant_elt_llst>  when  <cholce_llst>  => 
<comp_list> 

= <cholce> 

= <cholce_llst>  ! <cholce> 

= <simple_expr> 

= <name>  <range_constr> 

= <range> 

= others 

= access  <subtype_ind> 

= type  <identifier>  ; 

= type  <identifier>  <dlscr_part>  ; 

= <rel> 

= <rel_and_llst> 

= <rel_or_list> 

= <rel_xor_llst> 

= <rel  and  then  llst> 
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[179] 

<expr> 

[180] 

<expr> 

[181] 

<rel_and_list> 

[182] 

<rel_and_list> 

[183] 

<rel_or_list> 

[184] 

<rel_or_list> 

[185] 

<rel_xor_list> 

[186] 

<rel_xor_list> 

[187] 

<rel_and_then_llst> 

[188] 

<rel_and_then_llst> 

[189] 

<rel_or_else_list> 

[190] 

<rel_or_else  llst> 

[191] 

<rel> 

[192] 

<rel> 

[193] 

<rel> 

1 — 1 
I-* 

<o 

1 — t 

<rel> 

[195] 

<rel> 

[196] 

<slmple  expr  list> 

[197] 

<slmple_expr_list> 

[198] 

<slmple  expr> 

[199] 

<slmple  expr> 

[200] 

<term_list> 

[201] 

<term  llst> 

[202] 

<term> 

[203] 

<f actor_llst> 

[204] 

<factor  list> 

[205] 

<f actor> 

[206] 

<primary_list> 

[207] 

<primary_llst> 

[208] 

<primary> 

[209] 

<primary> 

[210] 

<primary> 

[211] 

<primary> 

[212] 

<prlmary> 

[213] 

<rel_op> 

[214] 

<rel_op> 

[215] 

<rel_op> 

[216] 

<rel_op> 

[217] 

<rel_op> 

[218] 

<rel  op> 

[219] 

<add_op> 

[220] 

<add_op> 

[221] 

<add  op> 

[222] 

<unary  op> 

[223] 

<unary  op> 

[224] 

<unary  op> 

[225] 

<mult_op> 

[226] 

<mult_op> 

[227] 

<mult  op> 

[228] 

<mult_op> 

= <rel_or_else_list> 

= <classplace> 

= <rel>  and  <rel> 

= <rel_and_list>  and  <rel> 

= <rel>  or  <rel> 

= <rel_or_list>  or  <rel> 

= <rel>  xor  <rel> 

= <rel_xor_list>  xor  <rel> 

= <rel>  and  then  <rel> 

= <rel_and_then_llst>  and  then  <rel> 

= <rel>  or  else  <rel> 

= <rel_or_else_list>  or  else  <rel> 

= <simple_expr_list> 

= <slmple_expr>  In  <subtype_ind> 

= <slmple_expr>  In  <range> 

= <slmple_expr>  not  In  <subtype_lnd> 

= <slmple_expr>  not  in  <range> 

= <simple_expr> 

= <simple_expr_list>  <rel_op>  <simple_expr> 
= <term_list> 

= <unary_op>  <term_list> 

= <term> 

= <term_list>  <add_op>  <term> 

= <factor_list> 

= <factor> 

= <f actor_list>  <mult_op>  <factor> 

= <primary>  <primary_list> 

= **  <primary> 

= <llteral> 

= <aggregate> 

= <name> 

= <allocator> 

= <quallf ied_expr> 

= < 

= > 

= + 

= * 

= + 

= not 
= * 

= / 

= mod 
= rem 
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[229]  <name> 

[230]  <name> 

[231]  <name> 

[232]  <name> 

[233]  <name> 

[234]  <selected_comp> 

[235]  <selected_comp> 

[236]  <selected_comp> 

[237]  <attr> 

[238]  <attr> 

[239]  <attr> 

[240]  <attr> 

[241]  <llteral> 

[242]  <llteral> 

[243]  <literal> 

[244]  <llteral> 

[245]  <aggregate> 

[246]  <comp_assoc_list> 

[247]  <comp_assoc_llst> 

[248]  <comp_assoc> 

[249]  <comp_assoc> 

[250]  <quallf ied_expr> 

[251]  <allocator> 

[252]  <allocator> 

[253]  <seq_of  _stmts> 

[254]  <seq_of_stmts> 

[255]  <strat> 

[256]  <strat> 

[257]  <stmt> 

[258]  <stmt> 

[259]  <stmt> 

[260]  <stmt> 

[261]  <label_list> 

[262]  <label_llst> 

[263]  <simple_stmt> 

[264]  <slmple_stmt> 

[265]  <simple_stmt> 
[286]  <simple_stmt> 

[267]  <simple_stmt> 

[268]  <simple_stmt> 

[269]  <slmple_stmt> 

[270]  <simple_stmt> 

[271]  <slmple_stmt> 

[272]  <slmple_stmt> 

[273]  <compound_stmt> 

[274]  <compound_stmt> 

[275]  < compound's tmt> 

[276]  <compound_stmt> 

[277]  <compound_stmt> 

[278]  <compound_stmt> 
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::=  <identifier> 

: :=  <name>  <arg_list> 

: :=  <selected_comp> 

::=  <attr> 

: :=  <op_symbol> 

::=  <name>  . <ldentlfler> 

: :=  <name>  . all 
: :=  <name>  . <op_symbol> 

::=  <name>  ’ <ldentlfler> 

: :=  <name>  * delta 
::=  <name>  * digits 
::=  <name>  ’ range 
: :=  <numeric_literal> 

: :=  <strlng> 

::=  <character> 

: :=  null 

::=  ( <comp_assoc_llst>  ) 

: :=  <comp_assoc> 

: :=  <comp_assoc_llst>  , <comp_assoc> 
<expr> 

::=  <choice_list>  =>  <expr> 

::=  <name>  ’ <aggregate> 

: :=  new  <name> 

::=  new  <qualif led_expr> 

: :=  <stmt> 

: :=  <seq_of_stmts>  <stmt> 

: :=  <simple_stmt> 

::=  <compound_stmt> 

: : = <pragma> 

<label_llst>  <simple_stmt> 

: :=  <label_llst>  <compound_stmt> 

: :=  <classplace> 

::=  <label> 

::=  <label_list>  <label> 

::=  <null_stmt> 

: :=  <assign_stmt> 

: :=  <return_stmt> 

: :=  <proc_or_entry_call> 
<delay_stmt> 

: :=  <ralse_stmt> 

<exlt_stmt> 

: :=  <goto_stmt> 

::=  <abort_stmt> 

: :=  <code_stmt> 

<lf_stmt> 

: :=  <loop_stmt> 

: :=  <accept_stmt> 

: :=  <case_stmt> 

<block> 

::=  <select  stmt> 
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[279] 

<label> 

: = 

<<  <identifier>  >> 

[280] 

<null  stmt> 

: = 

null  ; 

[281] 

<assign_stmt> 

: = 

<name>  :=  <expr>  ; 

[282] 

<lf_stmt> 

: = 

If  <cond>  then  <seq_of  stmts>  <elslf  llst>  end 
if  ; 

If  <cond>  then  <seq_of  stmts>  end  If  ; 

[283] 

<if_stmt> 

: = 

[284] 

<lf_stmt> 

• = 

If  <cond>  then  <seq_of_stmts>  <elslf_llst>  els 
<seq_of_stmts>  end  If  ; 

[285] 

<lf_stmt> 

• ■“ 

If  <cond>  then  <seq_of_stmts>  else 
<se<Lof_stmts>  end  If  ; 

[288] 

<elsif_list> 

: = 

elsif  <cond>  then  <seq_of_stmts> 

[287] 

<elslf_llst> 

: = 

<elslf_llst>  elsif  <cond>  then  <seq_of_stmts> 

[288] 

<cond> 

: = 

<boolean  expr> 

[289] 

<case  stmt> 

: = 

case  <expr>  Is  <when  llst>  end  case  ; 

[290] 

<vrhen  llst> 

: = 

[291] 

<when  llst> 

i = 

<when  llst>  when  <cholce  llst>  => 
<seq_of  stmts> 

[292] 

<loop  stmt> 

; — 

<baslc_loop>  ; 

[293] 

<loop  stmt> 

: = 

<lteratlon  clause>  <baslc  loop>  ; 

[294] 

<loop_stmt> 

: = 

<ldentlfler>  : <baslc  loop>  <ldentlfler>  ; 

[295] 

<loop  stmt> 

• — 

<ldentlfler>  : <lteratlon  clause>  <baslc  loop> 
<ldentlfler>  ; 

[298] 

<baslc  loop> 

: = 

loop  <seq^of_stmts>  end  loop 

[297] 

<lteratlon_clause> 

: = 

for  <loop_parm>  In  <dlscrete_range> 

[298] 

< Iteration  claus e> 

: = 

for  <loop_parm>  In  reverse  <dlscrete_range> 

[299] 

<lteratlon  clause> 

: = 

while  <cond> 

[300] 

<loop  parm> 

: = 

<ldentlf ler> 

[301] 

<block> 

: - 

begin  <seq_of  stmts>  <excepts  opt>  end  ; 

[302] 

<bloclc> 

» = 

declare  <decl_part>  begin  <seq_of_strats> 
<excepts_opt>  end  ; 

[303] 

<block> 

! • — 

<ldentlfler>  : begin  <seq_of_stmts> 
<excepts_opt>  end  <ldentlfler>  ; 

[304] 

<bloclc> 

• * = 

<ldentlfler>  : declare  <decl_part>  begin 
<seq^of_stmts>  <excepts_opt>  end  <ldentlfler> 

[305] 

<exlt_stmt> 

: = 

exit  ; 

[308] 

<exit_stmt> 

: = 

exit  <loop_name>  ; 

[307] 

<exlt_stmt> 

: = 

exit  when  <cond>  ; 

[308] 

<exlt_stmt> 

: = 

exit  <loop  name>  when  <cond>  ; 

[309] 

<return_stmt> 

: = 

return  ; 

[310] 

<return_stmt> 

: = 

return  <expr>  ; 

[311] 

<goto_stmt> 

: = 

goto  <label_name>  ; 

[312] 

<proc_or_entry_call> 

: = 

<name>  ; 

[313] 

<entry_decl> 

: = 

entry  <ldentlfler>  ; 

[314] 

<entry_decl> 

: = 

entry  <ldentlfler>  ( <discrete_range>  ) ; 

[315] 

<entry_decl> 

; = 

entry  <ldentlfler>  <frml_part>  ; 

[316] 

<entry_decl> 

• — 

entry  <ldentlfler>  ( <discrete_range>  ) 
<frml_part>  ; 

[317] 

<accept_stmt> 

: = 

accept  <entry_name>  ; 

[318] 

<accept  stmt> 

: = 

accept  <entry  name>  do  <seq_of  stmts>  end  ; 

[319] 

<accept_stmt> 

; = 

accept  <entry  name>  do  <seq_of  stmts>  end 
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[320]  <entry_name> 

[321]  <entry_name> 

[322]  <entry_name> 

[323]  <entry_name> 

[324]  <entry_index> 

[325]  <delay_stmt> 

[326]  <select_stmt> 

[327]  <select_stmt> 

[328]  <select_stmt> 

[329]  <selectlve_walt> 

[330]  <when_part_opt> 

[331]  <when_part_opt> 

[332]  <or_part_llst> 

[333]  <or_part_llst> 

[334]  <or_part> 

[335]  <or_part> 

[336]  <else_part_opt> 

[337]  <else_part_opt> 

[338]  <select_alt> 

[339]  <select_alt> 

[340]  <select_alt> 

[341]  <seq_of_stmts_opt> 

[342]  <seq_of_stmts_opt> 

[343]  <cond_entry_call> 

[344]  <cond_entry_call> 

[345]  <tlmed_entry_call> 

[346]  <abort_stmt> 

[347]  <taslc_name_llst> 

[348]  <task_name_llst> 

[349]  <ralse_stmt> 

[350]  <ralse_stmt> 

[351]  <private_type_def> 

[352]  <prlvate_type_def> 

[353]  <renaming_decl> 

[354]  <renamlng_decl> 

[355]  <renamlng_decl> 

[356]  <renaming_decl> 

[357]  <renamlng_decl> 

[358]  <except_decl> 

[359]  <except_handler> 

[360]  <except_cholce_llst> 

[361]  <except_choice_list> 

[362]  <except_cholce> 

[363]  <except_choice> 

[364]  <gnrc_subpgm_decl> 


<ldentlfler>  ; 

: :=  <ldentlfler>  ( <entry_lndex>  ) <frml_part> 
<ldentlfler>  <frml_part> 

: :=  <ldentlfler>  ( <entry_lndex>  ) 

: :=  <identlfier> 

: :=  <expr> 

: :=  delay  <slmple_expr>  ; 

: :=  <selectlve_walt> 

: :=  <cond_entry_call> 

: :=  <timed_entry_call> 

: :=  select  <when_part_opt>  <select_alt> 

<or_part_llst>  <else__part_opt>  end  select  ; 

::=  when  <cond>  => 

: :=  <or_part_llst>  <or_part> 

::=  or  <select_alt> 

: or  when  <cond>  =>  <select_alt> 

: :=  else  <seq_of_stmts> 

: :=  <accept_stmt>  <seq_of_stmts_opt> 

: :=  <delay_stmt>  <seq_of_stmts_opt> 

: : = terminate  ; 

: :=  <seq_of_stmts> 

: :=  select  <proc_or_entry_call>  else  <seq_of_stmts> 
end  select  ; 

: :=  select  <proc_or_entry_call>  <seq_of_stmts>  else 
<seq_of _stmts>  end  select  ; 

: :=  select  <proc_or_entry_call>  <seq_of_stmts_opt> 
or  <delay_stmt>  <seq_of_stmts_opt>  end  select  ; 
: :=  abort  <task_name_list>  ; 

: :=  <task_name> 

: :=  <task_name_llst>  , <task_name> 

: :=  raise  ; 

: :=  raise  <except_name>  ; 

: :=  private 
: :=  limited  private 

: :=  <identif ier_list>  : <name>  renames  <name>  ; 

::=  <ldentlf ler_llst>  : exception  renames  <name>  ; 

: :=  package  <ldentlfler>  renames  <name>  ; 

::=  task  <ldentlfler>  renames  <name>  ; 

::  = <subpgm_spec>  renames  <name>  ; 

: :=  <identif ler_llst>  : exception  ; 

: :=  when  <except_choice_llst>  =>  <seq_of_stmts> 

::=  <except_choice> 

: :=  <except_cholce_llst>  ! <except_choice> 

::=  <except_name> 

: :=  others 

: :=  <gnrc_part>  <subpgm_spec>  ; 
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[365]  <gnrc_pkg_decl> 

[366]  <gnrc_part> 

[367]  <gnrc_part> 

[368]  <gnrc_frml_parm_lst> 

[369]  <gnrc_frml_parm_lst> 

[370]  <gnrc_£rml_parm> 

[371]  <gnrc_£rml_parm> 

[372]  <gnrc_£rml_parm> 

[373]  <gnrc_£rml_parm> 

[374]  <gnrc_frml_parm> 

[375]  <gnrc_frml_parm> 

[376]  <gnrc_type_def > 

[377]  <gnrc_type_de£> 

[378]  <gnrc_type_def > 

[379]  <gnrc_type_def > 

[380]  <gnrc_type_def > 

[381]  <gnrc_type_def > 

[382]  <gnrc_type_def> 

[383]  <gnrc_subpgm_lnst> 

[384]  <gnrc_subpgm_inst> 

[385]  <gnrc_pkg_inst> 

[388]  <gnrc_lnst> 

[387]  <gnrc_lnst> 

[388]  <repr_spec> 

[389]  <repr_spec> 

[390]  <repr_spec> 

[391]  <len_or_enum_spec> 

[392]  <record_type_repr> 

[393]  <allgn_clause_opt> 

[394]  <align_clause_opt> 

[395]  <loc_clause_llst> 

[396]  <loc_clause_list> 

[397]  <loc> 

[398]  <align_clause> 

[399]  <addr_spec> 

[400]  <code_stmt> 

[401]  <arg_llst> 

[402]  <arg_part> 

[403]  <arg_part> 

[404]  <arg_ltem> 

[405]  <arg_ltem> 

[406]  <arg_ltem> 

[407]  <arg_item> 

[408]  <arg_stroke_llst> 

[409]  <arg_stroke_llst> 

[410]  <pkg_name> 

[411]  <unlt_name> 

[412]  <loop_name> 


<gnrc_part>  <pkg_spec>  ; 
generic 

generic  <gnrc_frml_parm_lst> 
<gnrc_£rml_parm> 

<gnrc_frml_parm_lst>  <gnrc_frml_parm> 
<parm_decl>  ; 

type  <ldenti£ler>  Is  <gnrc_type_def > ; 

type  <identlfier>  <dlscr_part>  Is 

<gnrc_type_def>  ; 

with  <subpgm_spec>  ; 

with  <subpgm_spec>  is  <name>  ; 

with  <subpgm_spec>  is  <>  ; 

( <>  ) 
range  <> 
delta  <> 
digits  <> 

<array_type_de£> 

<access_type_def > 

<prlvate_type_de£> 

<subpgm_spec>  Is  <gnrc_inst>  ; 
function  <deslgnator>  Is  <gnrc_inst>  ; 
package  <identlfler>  is  <gnrc_inst>  ; 
new  <deslgnator> 
new  <deslgnator>  <arg_llst> 
<len_or_enum_spec> 

<record_type_repr> 

<addr_spec> 

for  <name>  use  <expr>  ; 

for  <name>  use  record  <align_clause_opt> 

<loc_clause_list>  end  record  ; 

<allgn_clause>  ; 

<loc_clause_list>  <comp_name>  <loc>  ; 
at  <statlc_simple_expr>  range  <range> 
at  mod  <static_simple_expr> 
for  <name>  use  at  <static_simple_expr>  ; 
<qualifled_expr>  ; 

( <arg_part>  ) 

<arg_item> 

<arg_part>  , <arg_item> 

<expr> 

<name>  <range_constr> 

<range> 

<arg_stroke_list>  =>  <expr> 

<name> 

<name>  ! <arg_stroke_list> 

<name> 

<name> 

<name> 
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[413]  <label_name> 

[414]  <task_name> 

[415]  <except_name> 

[418]  <comp_name> 

[417]  <llteral_expr> 

[418]  <boolean_expr> 

[419]  <static_simple_expr> 

[420]  <comp_subtype_ind> 

[421]  <numeric_literal> 

[422]  <numeric_literal> 

[423]  <numeric_literal> 

[424]  <numeric_literal> 

[425]  <identlf ier> 

[426]  <character> 

[427]  <strlng> 

[428]  <op_symbol> 

[429]  <real> 

[430]  <integer> 

[431]  <based_real> 

[432]  <ba8ed  lnt> 


= <name> 

= <name> 

= <name> 

= <name> 

= <expr> 

= <expr> 

= <slmple_expr> 
= <subtype_ind> 
= <real> 

= <integer> 

= <based_real> 

= <based_lnt> 

= <classld> 

= <classchar> 

= <class8tr> 

= <classop> 

= <classreal> 

= <classlnt> 

= <classbreal> 
= <classblnb> 
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List  of  tokens  and  their  token  numbers.  Tokens  option.  Default:  on 


The  reserved  words  and  their  token  numbers  are: 


1 

abort 

2 accept 

3 

access 

4 

all 

5 and 

6 

array 

7 

at 

8 begin 

9 

body 

10 

case 

11  constant 

12 

declare 

13 

delay 

14  delta 

15 

digits 

18 

do 

17  else 

18 

elslf 

19 

end 

20  entry 

21 

exception 

22 

exit 

23  for 

24 

function 

25 

generic 

26  goto 

27 

if 

28 

in 

29  Is 

30 

limited 

31 

loop 

32  mod 

33 

new 

34 

not 

35  null 

36 

of 

37 

or 

38  others 

39 

out 

40 

package 

41  pragma 

42 

private 

43 

procedure 

44  raise 

45 

range 

46 

record 

47  rem 

48 

renames 

49 

return 

50  reverse 

51 

select 

52 

separate 

53  subtype 

54 

task 

55 

terminate 

56  then 

57 

type 

58 

use 

59  when 

60 

while 

81 

with 

62  xor 

The  angle-bracketed  terminals  and  their  token  numbers  are: 


63 

<classblnt> 

64  <classbreal> 

65 

<classchar> 

66 

<cla8sld> 

67  <classlnt> 

68 

<classop> 

69 

72 

<classplace> 
<eof  > 

70  <classreal> 

71 

<classstr> 

The  special 

symbols  and  their  token  numbers  are: 

73  ! 

74  * 

75  ' 

76  ( 

77  ) 

78  * 

79  ** 

80  + 

81  , 

82  - 

83  . 

84  .. 

85  / 

86  /= 

87  : 

88  : = 

89  ; 

90  < 

91  « 

92  <= 

93  <> 

94  = 

95  => 

96  > 

97  >= 

98  » 
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The  non-terminals  and  their  token  numbers  are: 


99  <abort_stmt> 

102  <accuracy_constr> 

105  <aggregate> 

108  <allocator> 

111  <arg_part> 

114  <assign_stmt> 

117  <based_real> 

120  <body> 

123  <case_stmt> 

126  <cholce_llst> 

129  <comp_assoc_llst> 

132  <comp_llst> 

136  <compilation> 

138  <compound_stmt> 

141  <context_spec> 

144  <decl_item_list> 

147  <derlved_type_de£> 
150  <dlscr_decl> 

153  <dlscr_part_opt> 

156  <elslf_llst> 

159  <entry_lndex> 

162  <enum_llteral_llst> 

185  <exc'ept_cholce_list> 
168  <except_handler> 

171  <exlt_stmt> 

174  <factor_llst> 

177  <frml_part> 

180  <gnrc_lnst> 

183  <gnrc_pkg_lnst> 

186  <gnrc_type_de£> 

189  <identlf ler_llst> 

192  <lndex> 

195  <lnteger> 

198  <label> 

201  <len_or_enum_spec> 
204  <loc> 

207  <loop_parm> 

210  <mult_op> 

213  <number_decl> 

218  <op_symbol> 

219  <parm_decl> 

222  <pkg_body> 

225  <pkg_name> 

228  <pragma> 

231  <prlvate_part_opt> 


100  <accept_stmt> 

103  <add_op> 

108  <align_clause> 

109  <arg_ltem> 

112  <arg_stroke_list> 
115  <attr> 

118  <baslc_loop> 

121  <body_stub> 

124  <character> 

127  <code_stmt> 

130  <comp_decl> 

133  <comp_name> 

136  <compilation_eof > 
139  <cond> 

142  <decl> 

145  <decl_part> 

148  <deslgnator> 

151  <dlscr_decl_llst> 
154  <dlscrete_range> 

157  <entry_decl> 

160  <entry_name> 

163  <enum_type_def > 

166  <except_decl> 

169  <except_name> 

172  <expr> 

175  <f lxed_pt_constr> 
178  <gnrc_frml_parm> 

181  <gnrc_part> 

184  <gnrc_subpgm_decl> 
187  <goto_stmt> 

190  <lf_stmt> 

193  <index_list> 

196  <lnteger_type_def> 
199  <label_llst> 

202  <llteral> 

205  <loc_clause_llst> 
208  <loop_stmt> 

211  <name> 

214  <numerlc_llteral> 
217  <or_part> 

220  <parm_decl_llst> 

223  <pkg  body_part_opt> 
226  <pkg_name_llst> 

229  <prlmary> 

232  <prlvate_type_de£> 


101  <access_type_de£> 

104  <addr_spec> 

107  <allgn_clause_opt> 
110  <arg_llst> 

113  <array_type_def > 

116  <based_int> 

119  <block> 

122  <boolean_expr> 

125  <cholce> 

128  <comp_assoc> 

131  <comp_decl_llst> 

134  <comp_subtype_ind> 
137  <compllatlon_unit> 
140  <cond_entry_call> 

143  <decl_ltem> 

146  <delay_stmt> 

149  <deslgnator_opt> 

152  <dlscr_part> 

155  <else_part_opt> 

158  <entry_decl_llst> 

181  <enum_llteral> 

184  <except_cholce> 

167  <except_hand_llst> 
170  <excepts_opt> 

173  <factor> 

176  <f loat_pt_constr> 

179  <gnrc_frml_parm_lst> 

182  <gnrc_pkg_decl> 

185  <gnrc_subpgm_lnst> 
188  <ldentlfler> 

191  <lncompl_type_decl> 
194  <init_opt> 

197  <lteratlon_clause> 
200  <label_name> 

203  <llteral_expr> 

206  <loop_name> 

209  <mode> 

212  <null_stmt> 

215  <obJect_decl> 

218  <or_part_llst> 

221  <pgm_comp> 

224  <pkg_decl> 

227  <pkg_spec> 

230  <prlmary_llst> 

233  <proc_or_entry_call> 
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234  <quallf led_expr> 

237  <range_constr> 

240  <record_type_def> 
243  <rel_and_llst> 

246  <rel_or_else_list> 
249  <renaming_decl> 

252  <return_stmt> 

255  <selected_comp> 

258  <seq_of_stmts_opt> 
261  <slmple_stmt> 

264  <strlng> 

267  <subpgm_spec> 

270  < subunit > 

273  <task_decl> 

276  <task_spec> 

279  <term_llst> 

282  <type_def> 

285  <unlt_name_llst> 

288  <varlant_part> 

291  <wlth  clause> 


System 


235  <ralse_stmt> 

238  <real> 

241  <record_type_repr> 
244  <rel_and_then_list> 
247  <rel_or_llst> 

250  <repr_spec> 

253  <select_alt> 

256  <selectlve_walt> 

259  <slmple_expr> 

262  <statlc_slmple_expr> 
265  <subpgm_body> 

268  <subtype_decl> 

271  <system_goal_symbol> 
274  <task_name> 

277  <task_spec_part> 

280  <tlmed_entry_call> 
283  <unary_op> 

286  <use_clause> 

289  <when_list> 

292  <vlth  use  llst> 
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236  <range> 

239  <real_type_def > 

242  <rel> 

245  <rel_op> 

248  <rel_xor_llst> 

251  <repr_spec_llst> 

254  <select_stmt> 

257  <seq_of_stmts> 

260  <slmple_expr_list> 
263  <stmt> 

266  <subpgm_decl> 

269  <subtype_lnd> 

272  <task_body> 

275  <task_name_llst> 

278  <term> 

281  <type_decl> 

284  <unlt_name> 

287  <varlant_elt_llst> 
290  <when_part_opt> 
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Input  grammar.  Grammar  option.  Default:  on 


The  goal  symbol  <Goal>  is  found  In  rule  1. 


[ 

1] 

<Goal> 

c 

2] 

<S> 

[ 

3] 

<S> 

[ 

4] 

<fpInputLlst> 

[ 

5] 

<fpInputList> 

[ 

8] 

<fplnput> 

c 

7] 

<fplnput> 

[ 

8] 

<fplnput> 

[ 

9] 

<fplnput> 

[ 

10] 

<fplnput> 

[ 

11] 

<fnDef > 

[ 

12] 

<application> 

[ 

13] 

<name> 

[ 

14] 

<nameList> 

[ 

15] 

<nameList> 

[ 

18] 

<object> 

[ 

17] 

<object> 

[ 

18] 

<ob]ect> 

[ 

19] 

<fpSequence> 

[ 

20] 

<fpSequence> 

[ 

21] 

<objectList> 

[ 

22] 

<obJectLlst> 

[ 

23] 

<ob]ectList> 

[ 

24] 

<atom> 

[ 

25] 

<atom> 

[ 

26] 

<atom> 

[ 

27] 

<atom> 

[ 28] 

<atom> 

[ 29] 

<atom> 

t 

30] 

<atom> 

[ 

31] 

<simpFn> 

[ 

32] 

<simpFn> 

[ 33] 

<fpDef ined> 

c 

34] 

<fpBulltin> 

[ 

35] 

<fpBuiltln> 

[ 

36] 

<fpBuiltin> 

[ 

37] 

<fpBuiltin> 

[ 

38] 

<fpBuiltin> 

[ 

39] 

<fpBuiltin> 

[ 

40] 

<f pBuiltin> 

[ 41] 

<fpBulltin> 

[ 42] 

<fpBuiltin> 

[ 43] 

<fpBuiltin> 

[ 44] 

<fpBuiltin> 

[ 45] 

<fpBuiltin> 

[ 

46] 

<fpBuiltin> 

= <S>  <eof> 

= <fpInputList> 

= <fpInputList>  <fplnput> 
= <fplnput> 

= <fnDef> 

= <applicatlon> 

= <fpCmd> 

= <classplace> 

= { <name>  <funForm>  > 

= <funForm>  : <object> 

= <classident> 

= <nameList>  <name> 

= <name> 

= <atom> 

= <fpSequence> 

- ? 

= < > 

= < <objectList>  > 

= <ob]ectLlst>  , <ob]ect> 
= <objectList>  <ob]ect> 

= <object> 

= T 
= F 
= <> 

= <classstrng> 

= <classident> 

= <classlnt> 

= <classreal> 

= <fpDefined> 

= <fpBulltin> 

= <name> 

= <selectFn> 

= tl 
= id 
= atom 
= not 
= e(l 

= <relFn> 

= null 
= reverse 
= distl 
= distr 
= length 
= <binaryFn> 
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[ 

47] 

<fpBuiltin> 

c 

48] 

<fpBuiltin> 

[ 

49] 

<fpBuiltln> 

c 

so] 

<fpBulltin> 

c 

51] 

<fpBulltin> 

[ 

52] 

<fpBuiltln> 

[ 

53] 

<fpBuiltin> 

[ 

54] 

<fpBuiltin> 

[ 

55] 

<fpBulltin> 

[ 

56] 

<fpBulltln> 

[ 

57] 

<fpBuiltln> 

c 

58] 

<fpBuiltin> 

[ 

59] 

<selectFn> 

[ 

60] 

<relFn> 

c 

61] 

<relFn> 

[ 

62] 

<relFn> 

[ 

63] 

<relFn> 

[ 

64] 

<relFn> 

[ 

65] 

<relFn> 

[ 

66] 

<blnaryFn> 

c 

67] 

<blnaryFn> 

[ 

68] 

<binaryFn> 

[ 

69] 

<blnaryFn> 

[ 

70] 

<binaryFn> 

[ 

71] 

<blnaryFn> 

[ 

72] 

<binaryFn> 

[ 

73] 

<llbFn> 

[ 

74] 

<llbFn> 

[ 

75] 

<llbFn> 

[ 

76] 

<llbFn> 

[ 

77] 

<libFn> 

[ 

78] 

<llbFn> 

[ 

79] 

<llbFn> 

[ 

80] 

<£unForm> 

[ 

81] 

<f unForm> 

[ 

82] 

<otherFun> 

[ 

83] 

<otherFun> 

[ 

84] 

<otherFun> 

c 

85] 

<otherFun> 

c 

86] 

<otherFun> 

c 

87] 

<otherFun> 

[ 

88] 

<otherFun> 

c 

89] 

<otherFun> 

c 

90] 

<otherFun> 

c 

91] 

<whlle> 

[ 

92] 

<condltlonal> 

[ 

93] 

<constructlon> 

[ 

94] 

<constructlon> 

[ 

95] 

<formLlsb> 

[ 

96] 

<£ormLlst> 

= trans 
= apndl 
= apndr 
= tlr 
= rotl 
= rotr 
= lota 
= pair 
= spilt 
= concat 
= last 
= <llbFn> 

= <classlnt> 
= <= 

= < 


= > 

= + 

= * 

= / 

= or 
= and 
= xor 
= sin 
= cos 
= asln 
= acos 
= log 
= exp 
= mod 

= <funForm>  § <otherFun> 

= <otherFun> 

= <classplace> 

= <slmpFn> 

= <constructlon> 

= <conditlonal> 

= <whlle> 

= <constantFn> 

= <lnsertion> 

= <alpha> 

= ( <funForm>  ) 

= ( while  <funForm>  <funForm>  ) 

= ( <funForm>  ->  <funForm>  ; <funForm>  ) 
= [ <formLlst>  ] 

= [ 1 

= <formLlst>  # <funForm> 

= <funForm> 
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[ 97]  <constantFn>  : : 
[ 98]  <lnsertlon>  : : 
[ 99]  <lnsertlon>  : : 
[100]  <alpha>  : : 


System 
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% <ob]ect> 

! <otherFun> 
I <otherFun> 
* <otherFun> 
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List  of  tokens  and  their  token  numbers.  Tokens  option.  Default:  on 


The 

reserved  words 

and  their  token 

numbers  are: 

1 

F 

2 T 

3 acos 

4 

and 

5 

apndl 

6 apndr 

7 asln 

8 

atom 

9 

concat 

10  cos 

11  dlstl 

12 

dlstr 

13 

eq 

14  exp 

15  Id 

16 

lota 

17 

last 

18  length 

19  log 

20 

mod 

21 

not 

22  null 

23  or 

24 

pair 

25 

reverse 

26  rotl 

27  rotr 

28 

sin 

29 

split 

30  tl 

31  tlr 

32 

trans 

33 

while 

34  xor 

The 

angle-bracketed 

terminals  and 

their  token  numbers  are: 

35 

<classldent> 

36 

<classlnt> 

37 

<classplace>  38  <classreal> 

39 

<classstrng> 

40 

<eof  > 

41 

<fpCmd> 

The 

special 

symbols  and 

i their 

token  numbers  are: 

42 

j 

43 

% 

44 

k 

45 

( 

46 

) 

47 

* 

48 

+ 

49 

9 

50 

- 

51 

-> 

52 

/ 

53 

: 

54 

9 

55 

< 

56 

<= 

57 

<> 

58 

= 

59 

> 

60 

>= 

61 

? 

62 

e 

63 

c 

64 

] 

65 

66 

67 

i 

68 

> 

69 

~= 

The 

non-terminals 

and  their  token 

numbers 

are: 

70 

<Goal> 

71  <S> 

72 

<alpha> 

73 

<appllcatlon> 

74 

<atom> 

75  <binaryFn> 

76 

<condltlonal> 

77 

<constantFn> 

78 

<constructlon> 

79  <fnDef > 

80 

<formLlst> 

81 

<fpBuiltin> 

82 

<fpDef lned> 

83  <fplnput> 

84 

<fpInputLlst> 

85 

<fpSequence> 

86 

<funForm> 

87  <lnsertlon> 

88 

<libFn> 

89 

<name> 

90 

<nameLlst> 

91  <obJect> 

92 

<obJectList> 

93 

<otherFun> 

94 

<relFn> 

95  <selectFn> 

96 

<simpFn> 

97 

<whlle> 

146 


Mystro  Translator  Writing  System  Version  7.0,  June  1983 

Pascal  grammar  Page  1 

Input  grammar.  Grammar  option.  Default:  on 

The  goal  symbol  <full  program>  is  found  in  rule  1. 


[ 1] 

<full_program> 

= <program>  <eof> 

[ 2] 

<program> 

= <program_head>  <block>  . 

[ 3] 

<program> 

= <declaratlons> 

[ 4] 

<program> 

= 

C 5] 

<program_head> 

= program  <classldent>  ; 

[ 6] 

<program_head> 

= program  <classident>  ( <ext_f ile_part>  ) 

[ 7] 

<ext_f ile_part> 

= <external_f ile> 

1 — 1 
s 

<ext_f lle_part> 

= <ext_f ile_part>  , <external_f ile> 

1 — 1 

(O 
1 — 1 

<external_f ile> 

= <classident> 

[ 10] 

<declarations> 

= <decl  element> 

[ li] 

<declarations> 

= <declarations>  <decl_element> 

[ 12] 

<decl  element> 

= <include  part> 

[ 13] 

<decl  element> 

= <label  decl> 

[ 14] 

<decl  element> 

= <cnst  def  part> 

[ 15] 

<decl  element> 

= <type  def  part> 

[ 16] 

<decl  element> 

= <var  decljpart> 

[ 17] 

<decl  element> 

= <proc  decl> 

[ 18] 

<decl_element> 

= <fcn  decl> 

[ 19] 

<include  part> 

= # Include  <classstrng> 

[ 20] 

<include_part> 

= # include  <classdqstr> 

[ 21] 

<label_decl> 

= <label_symbol>  <label_part>  ; 

[ 22] 

<label  symbol> 

= label 

[ 23] 

<labe Impart > 

= <label> 

C 24] 

<label  part> 

= <label_part>  , <label> 

[ 25] 

<label> 

= <classint> 

[ 26] 

<cnst  def_part> 

= <const  symbol>  <const  list> 

[ 27] 

<const  symbol> 

= const 

[ 28] 

<const_list> 

= <const  llst>  <const_def> 

[ 29] 

<const_list> 

= <const_def> 

[ 30] 

<const_def > 

= <classident>  = <constant>  ; 

C 31] 

<const  def> 

= <classplace>  ; 

[ 32] 

<constant> 

= <unsignedjnum> 

C 33] 

<constant> 

= <classstrng> 

[ 34] 

<constant> 

= <classldent> 

[ 35] 

<constant> 

= <sign>  <unsigned_num> 

[ 36] 

<sign> 

= + 

[ 37] 

<sign> 

= - 

[ 38] 

<unsigned  num> 

= <classint> 

[ 39] 

<unslgned_num> 

= <classreal> 

[ 40] 

<type_def  part> 

= <type  symbol>  <type_llst> 

[ 41] 

<type_symbol> 

type 

[ 42] 

<type  llst> 

<type  llst>  <type_def> 

[ 43] 

<type_list> 

<type  def> 

[ 44] 

<type_def > 

<classplace>  ; 

[ 45] 

ctype_def > 

<classldent>  = <type>  ; 

[ 46] 

<type> 

<slmple_type> 
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[ 47]  <type>  : 
[ 48]  <type>  : 
[ 49]  <type>  : 
[ 50]  <simple_type>  : 
[ 51]  <simple_type>  : 
[ 52]  <simple_type>  : 
[ 53]  <scalar_type>  : 
[ 54]  <scalar_type>  : 
[ 55]  <struct_type>  : 
[ 56]  <struct_type>  : 
[ 57]  <struct_type>  : 
[ 58]  <struct_type>  : 
[ 59]  <array_type>  : 
[ 60]  <index_list>  : 
[ 61]  <lndex_llst>  : 
[ 62]  <lndex_elt>  : 
[ 63]  <record_type>  : 
[ 64]  <record_end>  : 
[ 65]  <f leld_list>  : 
[ 66]  <f leld_llst>  : 
[ 67]  <fleld_llst> 

[ 68]  <flxed_part>  : 
[ 69]  <flxed_part>  : 
[ 70]  <record_sect>  : 
[ 71]  <record_sect>  : 
[ 72]  <variable_list>  : 
[ 73]  <varlable_list>  : 
[ 74]  <varlant_part>  : 
[ 75]  <tag>  : 
[ 76]  <tag>  : 
[77]  <varlant_llst>  : 
[ 78]  <varlant_llst>  : 
[ 79]  <varlant>  : 
[ 80]  <varlant>  : 
[ 81]  <f ld_lst_part>  : 
[ 82]  <case_lbl_list>  : 
[ 83]  <case_lbl_list>  : 
[ 84]  <case_label>  : 
[ 85]  <set_type>  : 
[ 86]  <file_type>  : 
[ 87]  <point_type>  : 
[ 88]  <var_decl_part>  : 
[ 89]  <var_symbol>  : 
[ 90]  <var_decl_llst>  : 
[ 91]  <var_decl_list>  : 
[ 92]  <var_decl_llst>  : 
[ 93]  <var_decl_llst>  : 
[ 94]  <proc_decl> 

[ 95]  <fcn_decl>  : 
[ 96]  <proc_fcn_£oll>  : 
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= <struct_type> 

= packed  <struct_type> 

= <polnt_type> 

= <classldent> 

= ( <scalar_type>  ) 

= <constant>  . . <constant> 

= <classldent> 

= <scalar_type>  , <classldent> 

= <array_type> 

= <record_type> 

= <set_type> 

= <file_type> 

= array  [ <index_list>  ] of  <type> 

= <index_elt> 

= <lndex_llst>  , <lndex_elt> 

= <slmple_type> 

= record  <field_llst>  <record_end> 

= end 

= <flxed_part> 

= <flxed_part>  ; <varlant_part> 

= <varlant_part> 

= <record_sect> 

= <fixed_part>  ; <record_sect> 

= <varlable_llst>  : <type> 

= <classldent> 

= <varlable_llst>  , <classldent> 

= case  <tag>  of  <varlant_llst> 

= <classldent>  : <classldent> 

= <classldent> 

= <varlant> 

= <varlant_llst>  ; <varlant> 

= <case_lbl_list>  : <f ld_lst_part> 

= ( <fleld_llst>  ) 

= <case_label> 

= <case_lbl_llst>  , <case_label> 

= <constant> 

= set  of  <simple_type> 

= file  of  <type> 

= * <classldent> 

= <var_symbol>  <var_decl_llst> 

= var 

= <varlable_llst>  : <type>  ; 

= <var_decl_llst>  <variable_list>  : <type> 
= <classplace>  ; 

= <var_decl_list>  <classplace>  ; 

= <proc_headlng>  ; <proc_fcn_foll>  ; 

= <f cn_headlng>  ; <proc_fcn_foll>  ; 

= <block> 
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[ 97]  <proc_f cn_f oll> 

[ 98]  <proc_f cn_foll> 

[ 99]  <proc_f cn_f oll> 

[100]  <proc_heading> 

[101]  <proc_name> 

[102]  <fcn_heading> 

[103]  <fcn_heading> 

[104]  <fcn_name> 

[105]  <parm_list> 

[106]  <parm_list> 

[107]  <£rml_parm_lst> 

[108]  <frml_parm_lst> 

[109]  <frml_parm_sct> 

[110]  <£rml_parm_sct> 

[111]  <frml_parm_sct> 

[112]  <frml_parm_sct> 

[113]  <block> 

[114]  <block> 

[115]  <stmt_list> 

[116]  <stmt_llst> 

[117]  <statement> 

[118]  <S1> 

[119]  <S1> 

[120]  <S1> 

[121]  <S1> 

[122]  <S1>  : 

[123]  <S1>  : 

[124]  <nested_if stmt>  : 

[125]  <nested_lf stmt>  : 

[126]  <nested_if stmt> 

[127]  <nested_ifstmt> 

[128]  <non_if stmtl> 

[129]  <non_lf stmtl> 

[130]  <non_if stmtl> 

[131]  <non_lf stmtl> 

[132]  <non_lf stmt2> 

[133]  <non_lf stmt2> 

[134]  <non_if stmt2> 

[135]  <non_if stmt2> 

[136]  <non_ifstmt> 

[137]  <non_lfstmt> 

[138]  <non_ifstmt> 

[139]  <non_lfstmt> 

[140]  <non_lfstmt> 

[141]  <non_ifstmt> 

[142]  <non_ifstmt> 

[143]  <non  lfstmt> 
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= forward 
= external 
= fortran 

= procedure  <proc_name>  <parm_list> 

= <classldent> 

= function  <fcn_name>  <parm_llst>  : <classident> 

= function  <fcn_name> 

= <classident> 

= ( <frml_parm_lst>  ) 

= <frml_parm_sct> 

= <frml_parm_lst>  ; <f rml_parm_sct> 

= var  <varlable_list>  : <classldent> 

= <varlable_list>  : <classident> 

= <proc_headlng> 

= <f cn_headlng> 

= <declarations>  begin  <stmt_list>  end 
= begin  <stmt_list>  end 
= <statement> 

= <stmt_list>  ; <statement> 

= <S1> 

= if  <expression>  then  <S1> 

= <label>  : if  <expression>  then  <S1> 

= If  <expression>  then  <nested_lf stmt>  else  <S1> 

= <label>  : if  <expression>  then  <nested_if stmt>  else 
<S1> 

:=  <non_if stmtl> 

:=  <label>  : <non_if stmtl> 

:=  if  <expression>  then  <nested_if stmt>  else 
<nested_if stmt> 

:=  <label>  : if  <expression>  then  <nested_if stmt>  else 
<nested_if  stmt> 

:=  <non_ifstmt2> 

:=  <label>  : <non_ifstmt2> 

:=  <for_stmtl> 

:=  <whlle_stmtl> 

:=  <with_stmtl> 

:=  <non_ifstmt> 

:=  <for_stmt2> 

:=  <while_stmt2> 

:=  <with_stmt2> 

:=  <non_ifstmt> 

:=  <assign_stmt> 

:=  <case_stmt> 

:=  <classplace> 

:=  <empty_stmt> 

:=  goto  <label> 

:=  <proc_stmt> 

:=  <repeat_stmt> 

:=  begin  <stmt_list>  end 
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[144]  <assign__stmt> 

[145]  <varlable> 

[148]  <variable> 

[147]  <variable> 

[148]  <variable> 

[149]  <case_stmt> 

[150]  <case_list> 

[151]  <case_list> 

[152]  <case_element> 

[153]  <case_element> 

[154]  <empty_stmt> 

[155]  <for_stmtl> 

[156]  <for_stmt2> 

[157]  <for_list> 

[158]  <for_llst> 

[159]  <for_llst> 

[100]  <repeat_stmt> 

[181]  <while_stmtl> 

[182]  <while_stmt2> 

[183]  <proc_stmt> 

[184]  <proc_stmt> 

[185]  <proc_stmt> 
[188]  <proc_stmt> 

[167]  <proc_stmt> 

[168]  <special_parms> 

[169]  <special_parms> 

[170]  <field_width> 

[171]  <field_wldth> 

[172]  <f ield_width> 

[173]  <with_stmtl> 

[ 174]  < with_s tmt2> 

[175]  <rcd_var_list> 

[176]  <rcd_var_list> 

[177]  <rcd_var_list> 

[178]  <express_list> 

[179]  <express_llst> 

[180]  <expression> 

[181]  <expression> 

[182]  <expression> 

[183]  <rel_op> 

[184]  <rel_op> 

[185]  <rel_op> 

[186]  <rel_op> 

[187]  <rel_op> 

[188]  <rel_op> 

[189]  <rel_op> 

[190]  <slmple_expres> 

[191]  <simple_expres> 

[192]  <slmple_expres> 

[193]  <add_op> 


<varlable>  :=  <expresslon> 
<classldent> 

<variable>  [ <express_list>  ] 
<variable>  . <classldent> 

<varlable>  ~ 

case  <expression>  of  <case_list>  end 
<case_element> 

<case_llst>  ; <case_element> 

<case  lbl  llst>  : <statement> 


for  <for_list>  do  <statement> 
for  <for_list>  do  <nested_if stmt> 

<classident>  :=  <expresslon>  to  <expresslon> 
<classldent>  :=  <expression>  downto  <expression> 
<classplace> 

repeat  <stmt_list>  until  <expression> 
while  <expresslon>  do  <statement> 
while  <expression>  do  <nested_if stmt> 
<classldent>  ( <act_parm_list>  ) 

<classldent> 

write  ( <special_parms>  ) 
writeln  ( <specialjparms>  ) 
wrlteln 

<simple_expres>  <f ield_width> 

<speclal_parms>  , <simple_expres>  <f ield_width> 

: <simple_expres>  : <simple_expres> 

: <simple_expres> 

with  <rcd_var_list>  do  <statement> 
with  <rcd_var_list>  do  <nested_if stmt> 

<variable> 

<rcd_var_list>  , <variable> 

<classplace> 

<expression> 

<express_list>  , <expression> 

<slmple_expres> 

<slmple_expres>  <rel_op>  <simple_expres> 
<classplace> 

<> 

< 

<= 

>= 

> 

In 

<classstrng> 

<term> 

<simple_expres>  <add_op>  <term> 

+ 
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[194] 

<add_op> 

: = 

- 

[195] 

<add  op> 

; 

or 

[196] 

<term> 

: = 

<f actor> 

[197] 

<term> 

: = 

<term>  <mult_op>  <factor> 

[198] 

<mult_op> 

: = 

* 

[199] 

<miilt_op> 

: = 

/ 

[200] 

<mult_op> 

: “ 

div 

[201] 

<mult_op> 

: = 

mod 

[202] 

<imilt_op> 

: = 

and 

[203] 

<f actor> 

: = 

<sign>  <factor> 

[204] 

<f actor> 

: = 

<variable> 

[205] 

<f actor> 

: = 

<unsigned_mim> 

[206] 

<f actor> 

: = 

nil 

[207] 

<f actor> 

: = 

( <expresslon>  ) 

[208] 

<f actor> 

: = 

[ <element  list>  ] 

[209] 

<f actor> 

: = 

[ ] 

[210] 

<factor> 

: - 

<classident>  ( <act_parm_list>  ) 

[211] 

<f  actor> 

: = 

not  <f actor > 

[212] 

<act_parm 

_llst>  : 

: = 

<expression> 

[213] 

<act_parm 

list>  : 

: = 

<act_parm_list>  , <expresslon> 

[214] 

<element_: 

llst>  : 

: = 

<element> 

[215] 

<element  ! 

list>  : 

: = 

<element  list>  , <element> 

[216] 

<element> 

: = 

<expression> 

[217] 

<element> 

i — 

<expression>  . . <expression> 

0,  June  1983 
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List  of  tokens  and  their  token  numbers.  Tokens  option.  Default:  on 


The  reserved  words  and  their  token  numbers  are: 


1 

and 

2 array 

3 

begin 

4 

case 

5 const 

6 

dlv 

7 

do 

8 downto 

9 

else 

10 

end 

11  external 

12 

file 

13 

for 

14  fortran 

15 

forward 

16 

function 

17  goto 

18 

if 

19 

in 

20  include 

21 

label 

22 

mod 

23  nil 

24 

not 

25 

of 

26  or 

27 

packed 

28 

procedure 

29  program 

30 

record 

31 

repeat 

32  set 

33 

then 

34 

to 

35  type 

36 

until 

37 

var 

38  while 

39 

with 

40 

write 

41  writeln 

The  angle-bracketed 

42  <classdqstr> 

45  <classplace> 

48  <eof> 


terminals  and  their 

43  <classident> 
46  <classreal> 


token  numbers  are: 

44  <classlnt> 

47  <classstrng> 


The  special  symbols  and  their  token  numbers  are: 


49 

# 

50  ( 

51 

) 

52 

★ 

53  + 

54 

$ 

55 

- 

56  . 

57 

, . 

58 

/ 

59  : 

60 

: = 

61 

» 

62  c 

63 

c= 

64 

c> 

65  = 

66 

> 

67 

>= 

68  [ 

69 

] 

70 

- 

The  non-terminals  and  their  token  numbers  are: 


71  <Si> 

74  <array_type> 

77  <case_element> 
80  cease  list> 


72  <act_parm_list> 
75  <assign_stmt> 

78  <case_label> 

81  cease  stmt> 


73  cadd_op> 

76  cblock> 

79  ccase_lbl_list> 
82  ccnst_def_part> 
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83  <const_def> 

86  <constant> 

89  <element> 

92  <express_list> 
95  <external_f ile> 
98  <fcn_heading> 
101  <field_width> 
104  <fld_lst_part> 
107  <for_stmt2> 

110  <full_program> 
113  <indei_list> 

116  <label_part> 

119  <nested_if stmt> 
122  <non_ifstmt> 

125  <proc_decl> 

128  <proc  name> 

131  <program_head> 
134  <record_sect> 
137  <repeat_stmt> 
140  <slgn> 

143  <special_parms> 
146  <struct_type> 
149  <type> 

152  <type_llst> 

155  <var_decl_list> 
158  <variable> 

161  <varlant_llst> 
164  <whlle  stmt2> 


84  <const_llst> 

87  <decl_element> 
90  <element_llst> 
93  <expression> 

96  <factor> 

99  <fcn_name> 

102  <file_type> 

105  <for_list> 

108  <frml_parmJLst> 
111  <include_part> 
114  <label> 

117  <label_symbol> 
120  <non_if stmtl> 
123  <parm_list> 

126  <proc_f cn_f oll> 
129  <proc_stmt> 

132  <rcd_var_list> 
135  <record_type> 
138  <scalar_type> 
141  <simple_expres> 
144  <statement> 

147  <tag> 

150  <type_def> 

153  <type_symbol> 
156  <var_decl_part> 
159  <variable_list> 
162  <variant_part> 
165  <with  stmtl> 


85  <const_symbol> 
88  <declarations> 
91  <empty_stmt> 

94  <ext_f ilej?art> 
97  <fcn_decl> 

100  <field_list> 

103  <fixed_part> 

106  <for_stmtl> 

109  <frml_parm_sct> 
112  <lndex_elt> 

115  <label_decl> 

118  <mult_op> 

121  <non_if stmt2> 
124  <point_type> 

127  <proc_headlng> 
130  <program> 

133  <record_end> 

136  <rel_op> 

139  <set_type> 

142  <simple_type> 
145  <stmt_list> 

148  <term> 

151  <type_def_part> 
154  <unsigned_num> 
157  <var_symbol> 

160  <variant> 

163  <wbile_stmtl> 
186  <with  stmt2> 
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An  Example  of  a Stepwise  Development  Methodology 

LEE  A.  BENZINGER* 


Abstract-  We  give  an  example  of  a 
stepwise  development  methodology  for  the 
development  of  software  which  uses  the 
Hoare  calculus  and  a notion  of  partial 
correctness  of  programs  with  respect  to 
specifications.  We  prove  that  this  example 
falls  within  the  framework  provided  by  an 
abstract  mathematical  model  for  software 
development.  Since  the  model  possesses 
some  of  the  basic  properties  that  we  would 
expect  of  an  idealised  development,  it  fol~ 
lows  that  the  example  also  possesses  these 
properties.  This  paper  uses  the  technique  of 
comparing  an  example  of  a software 
development  methodology  with  a abstract 
model  for  software  development  in  order  to 
gain  insight  into  the  methodology. 

Index  Terms-  Hoare  logic,  partial 
correctness,  stepwise  development. 


1.  Introduction 

The  task  of  developing  software  which  meets 
a given  specification  is  very  difficult.  Various 
approaches  have  been  suggested  to  make  the  task 
more  tractable.  In  [10]  the  problem  of  designing  an 
algorithm  which  meets  a specification  is  con- 
sidered. In  [13]  an  axiomatic  approach  to  the 
problem  of  program  of  correctness  proofs  for  pro- 
grams is  given,  while  in  [22]  and  [11]  stepwise 
approaches  to  program  development  are  con- 
sidered. The  Vienna  Development  Method  (VDM) 
[14]  is  a software  development  method  which  com- 
bines the  notions  of  stepwise  refinement  with 
proofs  of  correctness  at  each  step.  In  [16]  a step- 
wise approach  to  software  design  is  discussed 
which  includes  the  notion  of  correctness  of  a 
software  component  with  respect  to  a specification 
at  each  step.  The  purpose  of  this  paper  is  to  con- 
struct a mathematically  rigorous  foundation  for 
the  stepwise  approach  to  the  development  of 
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software.  This  work  of  part  of  the  SAGA 
(Software  Automation,  Generation  and  Adminis- 
tration) project,  which  is  concerned  with  providing 
an  environment  to  support  the  theory  and  practice 
of  software  development  [3-7,12,15,20,21]. 

In  order  to  compare  different  stepwise  design 
methodologies,  to  study  the  properties  of  a partic- 
ular design  methodology,  or  to  develop  new  design 
methodologies,  it  is  valuable  to  have  an  abstract 
mathematical  model  of  the  stepwise  development 
process.  A model  serves  as  a standard  for  com- 
parison of  design  methodologies.  If  we  can  prove 
that  design  methodology  A has  the  properties  of  an 
abstract  model  and  design  methodology  B either 
does  not  have  these  properties  or  it  is  not  known 
whether  B possesses  these  properties,  then  we  have 
a basis  for  choosing  A over  B.  In  developing  a new 
design  methodology,  a proof  that  it  satisfies  the 
properties  of  an  abstract  model  is  a guarantee  that 
the  software  component  obtained  as  a result  of 
using  the  methodology  will  at  least  possess  the  pro- 
perties of  the  abstract  model.  This  is  a significant 
improvement  over  the  situation  in  which  we  impli- 
citly assume  that  a design  methodology  has  desir- 
able properties  because  it  seems  intuitively  reason- 
able. 

In  [2]  an  abstract  mathematical  model  for 
the  stepwise  development  of  software  is  presented. 
The  model  is  quite  simple  in  that  it  describes  an 
idealized  development.  Issues  such  as  backtracking 
or  the  effect  on  a development  of  changing  the  ori- 
ginal specification  are  not  considered.  The  model 
is  fairly  general  since  it  is  independent  of  the 
notions  of  specification,  correctness,  and  implemen- 
tation. These  notions  are  dependent  upon  a par- 
ticular design  methodology,  not  the  abstract 
model. 

In  this  paper  we  present  an  abstract  model, 
an  example  of  a stepwise  development  methodol- 
ogy that  has  the  properties  of  the  abstract  model, 
and  sketch  the  proof  that  the  properties  are 
satisfied  by  the  example.  Section  2 contains  an 
overview  of  the  abstract  model.  In  section  3 we 
give  an  outline  of  the  construction  of  the  example 
and  the  proof  that  it  satisfies  the  requirements  of 
the  model.  Section  4 contains  definitions  which  are 


1 


used  in  the  construction  of  the  example.  Sections  5 
and  6 contain  proofs  which  indicate  in  more  detail 
the  methods  used  for  showing  that  the  example 
does  have  the  properties  of  the  abstract  model. 
Section  7 contains  the  conclusion. 

2*  The  Abstract  Model 

In  this  section  we  present  an  informal  discus- 
sion of  the  abstract  model.  See  [2]  for  further 
details  and  proofs.  We  define  an  abstract  program 
A as  an  ordered  pair,  (5,  C),  where  S is  a 
specification  and  C is  the  set  of  all  implementa- 
tions which  are  correct  with  respect  to  S.  The  set 
C may  be  empty.  This  can  occur,  for  example, 
when  S is  inconsistent  and  there  exists  no  imple- 
mentation which  is  correct  with  respect  to  5.  As 
already  noted,  the  notions  of  specification,  imple- 
mentation, and  correctness  are  left  undefined  in 
the  discussion  of  the  abstract  model.  - We  are  pri- 
marily interested  in  a model  for  stepwise  design 
methodologies  which  allows  us  to  study  those  pro- 
perties which  are  intrinsic  to  an  idealized  stepwise 
development  process,  independent  of  the  notions  of 
specification,  implementation,  and  correctness. 

A development  D with  respect  to  a 
specification  50  is  an  (n  + I)— tuple  of  abstract  pro- 
grams, (A0f  Au  ...,  4),  for  some  nonnegative 
integer  n such  that  for  each  i,  0 < i < n,  Ax  = ($*, 
CJ.  Let  C be  a set.  By  Id  we  mean  the  cardinal- 
ity of  C.  D is  correct  if  Ci+l  C Ciy  0 < i < n.  D is 
complete  if  Icj  = 1.  D is  incomplete  if  Icj  > 1. 
Correct  and  complete  developments  are  those 
which  start  out  with  an  abstract  program  Aq  = 
(S0,  C0),  as  the  first  member  of  the  ordered  (n  + 
l)ltuple  which  is  the  development.  S0  is  the  origi- 
nal specification.  The  last  member  of  the  develop- 
ment is  (Sn,  Cn).  Cn  is  a set  which  contains  a sin- 
gle implementation  and  Sn  is  the  last  specification 
in  the  development.  The  sets  of  implementations 
form  a nested  family;  that  is,  for  each  integer  i,  0 
< i < n,  Ci+1  C Ct.  Because  the  sets  of  implemen- 
tations have  this  property,  it  follows  that  any 
implementation  which  is  correct  with  respect  to  a 
given  specification  in  a development  is  also  correct 
with  respect  to  all  preceding  specifications  in  the 
development.  This  property  ensures  that  the 
implementation  obtained  from  a development  is 
correct  with  respect  to  the  original  specification. 
Correct  and  incomplete  developments  are  develop- 
ments that  are,  intuitively,  correct  so  far,  but  are 
not  finished.  The  last  abstract  program  in  a 
correct  and  incomplete  development  is  an  ordered 


pair,  ($n,  Cn).  Cn  is  a set  with  more  than  a single 
implementation  which  is  correct  with  respect  to 
the  specification  Sn.  $n  specifies  a family  of  imple- 
mentations rather  than  a single  implementation. 

In  a stepwise  development,  developments  are 
formed  from  steps.  A development  step  is  the 
result  of  a process  of  going  from  one  abstract  pro- 
gram to  another.  A development  step  with  respect 
to  a specification  Si  is  an  ordered  pair  of  abstract 
programs,  (^i,  >Ji+J,  such  that  Ax  = (Si,  CJ  and 
^i+i  = ($i+i,  Ci+J.  Let  D = ((So,  Co),  (St,  CJ, 
...,  (Sn,  CJ)  be  a development  with  respect  to  a 
specification  S0-  Let  ((Sj,  Cj),  (Sj+l,  Cj+J)  be  a 
development  step  with  respect  to  the  specification 
Sj.  The  development  D contains  the  development 
step  if  j = i for  some  integer  i,  0 < i < n - 1;  that 
is,  the  development  step  is  ((Si?  CJ,  (Si+i,  Cw)), 
where  (Sj,  Ci)  and  (S^u  Ci+j)  are  successive 
members  of  the  (n  + l)-tuple  which  is  the  develop- 
ment with  respect  to  the  specification  So*  A 
development  step  with  respect  to  a specification  S; 
for  some  nonnegative  integer  i,  ((S„  CJ,  (Si+l, 
Ci+i)),  is  correct  if  the  following  hold: 

(1)  C„  Ci+l  0 

(2)  Cl+1  C 

A development  step  with  respect  to  a specification 
Sj,  ((£.  Ci),($i+t,  Cl+l)),  is  complete  if  ICjJ  = 1.  A 
development  step  with  respect  to  a specification  S\ , 
((^i,  CJ,(Si+l,  Cl+J),  is  incomplete  iflci+1l>  1. 

Developments  can  be  extended  by  develop- 
ment steps  to  form  new  developments.  We  state  a 
result  about  extensions  of  developments  by 
development  steps. 

Theorem:  Let  D be  a correct  and  incomplete 

development,  ((S0,  C0),  (Su  CJ,  ...,  (Sn,  Cn)),  with 
respect  to  the  specification  Sq.  Suppose  that  ((Sn, 
ca),  (Sn+l,  cn+j)  is  a complete  and  correct 
development  step  with  respect  to  Sn.  Let  Dl  be 
((So.  C0),  (Si,  CJ,  ...,  (Sq+i,  Ca+J).  Di  is  a correct 
and  complete  development  with  respect  to  the 
specification  S0,  which  contains  the  given  develop- 
ment step. 

Developments  can  be  constructed  from 
development  steps.  The  properties  of  the  resulting 
developments  depend  upon  the  properties  of  the 
development  steps  used  in  the  construction  of  the 
developments.  The  following  result  shows  that 
development  steps  can  be  viewed  as  “building 
blocks'’  for  the  construction  of  developments. 

Theorem:  Let  ((S0,  C0),  (Su  CJ),  ((Si,  CJ,  (S2, 
C2)),  ...,  ((5n_lT  Cn_i),  ($„,  cn))  be  a collection  of  n 
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correct  development  steps  with  respect  to  the 
specifications  $o>  Su  ...,  Sn  respectively,  for  some 
positive  integer  n.  Furthermore,  suppose  that 
Cn_.),  ($a,  Cn))  is  a complete  development 
step.  Let  D = (($o>  ^o)»  ($i>  ^i)>  •••»  ($n>  Ca)). 
Then  D is  a correct  and  complete  development 
with  respect  to  the  specification  50. 

3.  A Stepwise  Design  Methodology 

In  this  section  we  outline  the  approach  we 
use  to  construct  an  example  of  a stepwise  design 
methodology  and  to  prove  that  the  methodology 
actually  does  satisfy  the  constraints  of  the  abstract 
model.  Initially,  we  need  to  define  the  concepts  of 
an  implementation,  a specification,  and  correctness 
with  respect  to  a specification.  In  the  example,  an 
implementation  is  a while-program,  a program  in 
a programming  language  which  allows  assignment 
statements,  composed  statements,  conditional 
statements,  and  while  statements.  A specification 
is  in  terms  of  pre-  and  post-conditions  and  the 
constructs  of  the  while-programming  language. 
By  correctness  with  respect  to  a specification  we 
mean  an  extension  of  the  notion  of  partial  correct- 
ness with  respect  to  formulas  from  first  order  logic. 
Most  of  the  notation  and  terminology  which  we 
use  is  in  [18].  We  try  to  be  consistent  with  [18] 
when  we  introduce  new  concepts  and  notation. 

3.1.  A Correct  Development  Step 

In  a stepwise  design  methodology,  we  must 
be  able  to  construct  correct  development  steps.  In 
addition  to  a notion  of  correctness,  it  is  necessary 
to  have  a deductive  system  within  which  we  can 
prove  that  implementations  are  correct  with 
respect  to  specifications.  For  the  example,  we  use 
the  axiomatic  method  of  Hoare.  The  Hoare 
method  is  used  in  program  verification  to  prove 
while-programs  partially  correct,  but  in  the  exam- 
ple the  method  is  extended  so  that  it  is  used  to 
prove  implementations  partially  correct  with 
respect  to  specifications  at  each  step  in  a develop- 
ment. Generally,  at  each  step  in  a development 
except  for  the  last,  a specification  will  specify  a 
family  of  implementations  rather  than  a single 
while-program.  In  the  terminology  of  the  abstract 
model,  the  problem  of  program  verification  is  the 
following:  Given  an  abstract  program,  A = (S, 
C),  and  a program  W,  prove  that  W £ C;  that  is, 
given  a program  W,  show  that  it  is  correct  with 
respect  to  the  specification  5.  In  a stepwise  metho- 
dology, given  an  abstract  program,  Ax  = (5},  CJ, 


we  must  be  able  to  construct  a new  abstract  pro- 
gram,  yti+t  = ($i+l,  Ci+l),  so  that  (Ait  Ai+l)  is  a 
correct  development  step.  In  the  example,  given 
an  abstract  program,  A\  — (5*,  C{)}  we  construct  a 
new  abstract  program,  AM  = (5i+1,  Ci+1).  We 
also  prove  that  Ci+l  C Cx  or  W £ Ci+l  implies  that 
W £ Cj.  The  construction  of  the  new  abstract  pro- 
gram Ax+i  and  the  proof  that  the  pair  of  abstract 
programs  (>1^  Ax+i)  is  a correct  development  step 
depends  upon  specification  transformations, 

T:  $ - $i+l. 

These  transformations  are  defined  explicitly. 

3.2.  A Development 

From  the  abstract  model  we  know  that  to 
construct  a correct  and  complete  development  it  is 
sufficient  to  construct  a series  of  correct  develop- 
ment steps  followed  by  a single  correct  and  com- 
plete development  step.  A correct  and  complete 
development  step,  is  a correct  development  step, 

(4.  4+i)  = ((Si.  Ci).  (SUi,  cl+l)),  with  the  addi- 
tional property  that  ICi+1l  = 1.  The  specification 
si+ 1 must  be  detailed  enough  so  that  it  specifies 
exactly  one  while-program.  We  use  the  concept  of 
an  annotated  program  to  describe  such  a 
specification.  In  [2]  we  prove  that  for  an  abstract 
program  A = (5,  C)  such  that  S is  an  annotated 
program  and  C ?=  0,  it  follows  that  Cl  = l. 

3.3.  Stepwise  Verification 

Given  a correct  development  step,  [Ax9  .4^) 
= ((Si,  Ci),  ($i+l,  Ci+1)),  and  a while-program  W £ 
Q,  we  introduce  proof  rules  which,  when  satisfied, 
enable  us  to  prove  that  W £ Ci+1.  It  is  necessary 
to  have  additional  constraints  other  than  W £ Ci9 
since  Ci+1  is  a subset  of  Cr  The  theorems  which 
use  these  proof  rules  formalize  the  stepwise 
verification  process.  The  proofs  of  these  theorems 
clarify  the  stepwise  verification  process.  For  any 
correct  development  step,  ((Si,  CJ,  (Si+1,  Ci+1)), 
and  any  W £ Ci9  the  conditions  under  which  we 
can  prove  W £ depend  upon  the  assumption 
that  we  will  be  able  to  prove  the  “incompletely 
specified  parts”  of  W correct  with  respect  to 
Because  it  is  only  under  this  assumption  that  we 
can  prove  W£  we  do  not  have  a verification 
of  an  implementation  in  as  strong  a sense  as  the 
verification  of  a program  until  we  reach  the  last 
development  step,  which  is  correct  and  complete. 
At  this  point,  no  additional  assumptions  concern- 
ing the  implementation  W and  the  specification 
$i+x  are  necessary  to  prove  W £ Si+1,  since  W is 
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completely  specified  by  $i+1. 

These  proof  rules  are  somewhat  similar  to 
the  rules  in  [14]  for  control  refinement.  Unlike  [14], 
the  rules  we  use  are  embedded  in  a methodology 
which  uses  the  Hoare  calculus  for  obtaining  deriva- 
tions. In  section  6 we  give  an  example  of  a lemma 
which  uses  the  proof  rules  in  a special  case  of  a 
composed  statement  specification  transformation. 

4.  Basic  Definitions 

In  this  section  we  give  a precise  definition  of 
the  syntax  of  while-programs,  the  syntax  of 
specifications  in  terms  of  pre—  and  post-conditions, 
partial  correctness  of  a while-program  with  respect 
to  a specification,  and  the  syntax  of  annotated  pro- 
grams. The  definition  of  partial  correctness  of  a 
while-program  with  respect  to  a specification  is  an 
extension  of  the  notion  of  partial  correctness  of  a 
while-program  with  respect  to  formulas.  We  need 
to  define  some  terms  which  are  used  in  these 
definitions.  Let  B be  a basis  for  predicate  logic,  V 
the  set  of  variables,  TB  the  set  of  terms,  QFFB  the 
set  of  quantifier  free  formulas,  and  WFFB  the  set 
of  all  well-formed  formulas  of  first-order  predicate 
logic  over  the  basis  B. 

Definition:  (Syntax  of  Lw)  The  set,  L$r,  of 

while-programs  for  the  basis  B is  defined  induc- 
tively as  follows: 

a)  Assignment  statement  If  x is  a variable 
from  V and  t is  a term  from  TB,  then 

x t 

is  a while-program. 

b)  Composed  statement  If  Wu  W2  are 
while-programs  then 

W,  ; W2 

is  a while-program. 

c)  Conditional  statement  If  Wu  W2  are 
while-programs  and  e is  a quantifier  free 
formula  from  QFFB,  then 

if  e then  W x else  W2  fi 
is  a while-program. 

d)  While  statement  If  Wt  is  a while-program 
and  e is  a quantifier  free  formula  from 
QFFb,  then 

while  e do  W l od 
is  a while-program. 

Definition:  (Syntax  of  Ls)  The  set,  Lf,  of 

specifications , for  the  basis  B is  defined  inductively 


as  follows: 

a)  Simple  specification  If  p,  q are  formulas 
from  WFFb,  then 

{p}  lq} 

is  a specification. 

b)  Assignment  specification  If  x is  a variable 
from  V,  t is  a term  from  TB  and  p,  q are 
formulas  from  WFFB,  then 

{p}  * :=  t {q} 
is  a specification. 

c)  Composed  specification  If  Su  S2  are 
specifications  and  p,  q are  formulas  from 

WFFb,  then 

{p}  sx  ; $2  {q} 

is  a specification. 

d)  Conditional  specification  If  S2  are 
specifications,  e is  a quantifier  free  formula 
from  QFFb,  and  p,  q are  formulas  from 
WFFb,  then 

{p}  *7 e Si  else  S2  fi  {q} 
is  a specification. 

e)  While  specification  If  Sx  is  a specification, 
e is  a quantifier  free  formula  from  QFFg, 
and  p,  q are  formulas  from  WFFB,  then 

{p}  while  e do  Sx  od  {q} 
is  a specification. 

We  call  specifications  which  are  not  simple 
structured  specifications . An  operational  seman- 
tics for  Lyf  ln  terms  of  an  interpretation  I for  the 
basis  B and  a definition  of  partial  correctness  with 
respect  to  formulas  is  given  in  [18], 

Definition:  (Correctness  with  Respect  to 

Specifications)  Let  W be  a while-program  from 
Lw*  The  notion  that  W is  partially  correct  with 
respect  to  the  specification  S (in  the  interpretation 
I)  is  defined  inductively  (the  induction  being  on 
the  specification,  S ) as  follows: 

a)  If  S is  a simple  specification, 

{p}  (q}> 

where  p,  q are  formulas  from  WFFB,  then 
W is  partially  correct  with  respect  to  S if 
(i)  W is  partially  correct  with  respect 
to  p and  q (in  the  interpretation 

i)- 

b)  If  S is  an  assignment  specification, 
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{p}  x :=  t {q}, 

where  x is  a variable  from  V,  £ is  a term 
from  Tb  and  p,  q are  formulas  from 
WFFB,  then  W is  partially  correct  with 
respect  to  S if  the  following  hold: 

(i)  W is  x .*=  t 

(ii)  W is  partially  correct  with  respect 
to  p and  q. 


<0 


If  5 is  a composed  specification, 

{p}  ; S2  {q}, 

where  5^  S2  are  specifications  from  L®, 
and  p,  q are  formulas  from  WFFB,  then  W 
is  partially  correct  with  respect  to  S if  the 
following  hold: 

(i)  W is  Wi  ; W2  for  some  WL,  W2  £ 

T B 
Lw- 


(ii)  W is  partially  correct  with  respect 
to  p and  q. 

(iii)  Wj  is  partially  correct  with 

respect  to  the  specification  S\* 

(iv)  W2  is  partially  correct  with 

respect  to  the  specification  $2. 


Definitions  Let  W,  I,  5,  p and  q be  as  in  the 
preceding  definition.  Then  the  formulas  p and  q 
are  called,  respectively,  the  pre-condition  and 
poet-condition  associated  with  the  specification  S. 

For  example,  if  S is  the  simple  specification, 

{p}  (qh  . 

then  the  pre-  and  post-conditions  associated  with 
S are  p and  q. 

Definitions  ( Syntax  of  La)  The  set,  L®,  of  anno- 
tated programs  for  the  basis  B is  defined  induc- 
tively as  follows: 

a)  Assignment  statement  If  x is  a variable 
from  V , t is  a term  from  TB,  and  p,  q are 
formulas  from  WFFB,  then 

{p}  x :=  t {q} 
is  an  annotated  program. 

b)  Composed  statement  If  Au  A2  are  anno- 
tated programs,  and  p,  q are  formulas 
from  WFFb,  then 

{p}  Ai  ; A2  {q} 
is  an  annotated  program. 


d)  If  S is  a conditional  specification, 

{p}  if  e then  Sx  else  S2  fi  {q}, 
where  Si,  S2  are  specifications  from  L®,  e 
is  a quantifier  free  formula  from  QFFB, 
and  p,  q are  formulas  from  WFFa,  then  W 
is  partially  correct  with  respect  to  S if  the 
following  hold: 

(i)  W is  if  e then  W\  else  W2  fi  for 
some  Wlf  W2  £ Lw- 

(ii)  W is  partially  correct  with  respect 
to  p and  q. 

(iii)  is  partially  correct  with 
respect  to  the  specification 

(iv)  W2  is  partially  correct  with 
respect  to  the  specification  $2, 

e)  If  S is  a while  specification, 

{p}  while  e do  Sx  od  {q}, 
where  Si  is  a specification  from  L®,  e is  a 
quantifier  free  formula  from  QFFB,  and  p, 
q are  formulas  from  WFFB,  then  W is  par- 
tially correct  with  respect  to  S if  the  fol- 
lowing hold: 

(i)  W is  while  e do  od  for  some 
Wj  € Lw- 

(ii)  W is  partially  correct  with  respect 
to  p and  q. 

(iii)  W!  is  partially  correct  with 
respect  to  the  specification  St. 


c)  Conditional  statement  If  Alf  A2  are  anno- 
tated programs,  p,  q are  formulas  from 
WFFb,  and  e is  a quantifier  free  formula 
from  QFFb,  then 

{pl  e then  At  else  A2  fi  { q} 
is  an  annotated  program. 

d)  While  statement  If  At  is  an  annotated 
program  p,  q are  formulas  from  WFFB, 
and  e is  a quantifier  free  formula  from 
QFFB,  then 

{p}  while  e do  Ax  od  {q} 
is  an  annotated  program. 

We  make  a distinction  in  the  preceding 
definitions  between  the  sets  of  all  while-programs, 
Lw*  specifications,  L®,  and  annotated  programs, 
L®,  and  the  corresponding  sets  along  with  an 
interpretation,  which  w'e  denote  by  Lw>  £s > 
respectively. 

5.  Derivations  and  Partial  Correctness 

In  this  section  we  assume  some  definitions 
and  results  concerning  Hoare  logic  and  calculus. 
See  [18]  for  more  details  and  [1]  for  a survey  of 
Hoare  logic.  We  denote  by  HFB  the  set  of  all 
Hoare  formulas, 

{p}  w {q}> 

where  p,  q £ WFFB  and  W £ Lw  is  a while- 
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program.  A theory  of  an  interpretation  I of  a 
basis  B for  predicate  logic  (denoted  by  Th(I))  is 
the  set  of  all  formulas  which  are  valid  in  I . Proofs 
appear  in  [2]  for  the  following  two  lemmas.  The 
first  lemma  shows  the  connection  between  partial 
correctness  with  respect  to  a simple  specification 
and  the  existence  of  a derivation  from  a theory  of 
an  interpretation.  The  second  lemma  shows  how 
to  construct  an  abstract  program  from  a simple 
specification. 

Lemma:  ( Derivations  from  a Theory  and  Partial 

Correctness)  Let  B be  a basis  for  predicate  logic 
and  I an  interpretation  of  B.  Let  S be  the  simple 
statement  specification, 

(pl  (q>- 

It  follows  that  for  each  Hoare  formula  {pj  W {q}  G 
HFb,  if  Th (2)  h {p}  W {qh  then  W is  partially 
correct  with  respect  to  the  specification  S. 


Lemma:  Let  S be  the  simple  specification, 

fp}  (q}> 


and  let 

C = { W E Lyy  1 Th(J)  |—  !p}W{q]  \. 
Then  ($,  C)  is  an  abstract  program. 


We  introduce  a definition  which  is  an  exten- 
sion of  the  notion  of  the  deduction  of  a Hoare  for- 
mula from  a theory.  This  definition  is  used  to 
associate  a set  C of  implementations  with  a 
specification  S from  LB.  This  section  also  contains 
a theorem  which  shows  that  the  pair,  (5,  C),  is  an 
abstract  program.  This  extends  a similar  result 
for  simple  specifications. 


Definition:  (Deduction  Consistent  with  a 

Specification)  Let  B be  a basis  for  predicate  logic, 
W a while-program  from  Lw>  l an  interpretation 
of  the  basis  B,  S a specification  from  LB , and  p',  q', 
respectively,  the  pre-  and  post-conditions  associ- 
ated with  the  specification  S.  The  notion  that 
there  is  a deduction  from  Th(I)  to  the  Hoare  for- 
mula {p'}  W {q'}  consistent  with  S , denoted  by: 

Th( I)  {p'}  W {q% 

is  defined  inductively  (the  induction  being  on  the 
specification,  S)  as  follows: 


a)  If  5 is  a simple  specification, 

{p'l  U'}, 

then 

Th (I)  {p'}  W {q'} 

if 

(i)  Th(7)  \-  {p'}  W (q'}. 


b)  If  5 is  an  assignment  specification, 

{p'}  * •=  t (q'}, 


where  x is  a variable  from  V,  t is  a term 
from  Tb  then 

Th  (!)  (P'}  W {q'} 

if  the  following  hold: 

(i)  W is  x :=  t 

(ii)  Th(i)|-  {p'}  W{q'}. 

c)  If  5 is  a composed  specification, 

(p'}  Si ; S2  {q'}t 

where  5lf  $2  are  specifications  from  Lf , pu 
qt  and  p2,  q2  are  the  pre-  and  post- 
conditions associated  with  5lf  and  S2, 
respectively,  then 

Th (I)  {p'}  w {q;} 

if  the  following  hold: 

(i)  W is  Wt  ; W2  for  some  Wu  W2  g 
LB 

(ii)  Th  (I)  \—  {p'}  W {q'} 

(iii)  Th(J)  HSl  (Pj  Wt  {qj 

(iv)  Th  (I)  {p2}  W2  {q2}. 

d)  If  5 is  a conditional  specification, 

{p*}  e then  Si  else  S2  fi  { q'} , 
where  $2  are  specifications  from  L| , e 
is  a quantifier  free  formula  from  QFFB,  pu 
qx  and  p2,  q2  are  the  pre-  and  post- 
conditions associated  with  and  S2, 

respectively,  then 

Th (i)  {p'}  W {q'} 

if  the  following  hold: 

(i)  W is  if  e then  Wt  else  W2  fi  for 
some  Wu  W2  £ Lw- 

(ii)  Th  (I)  1—  {P;)  W {q;} 

(iii)  Th  (I)  \—Sl  {Pl}  Wt  {qj 

(iy)  Th (i)  {p2}  W2  {q,}. 

e)  If  S is  a while  specification, 

{p1}  while  e do  Si  od  {q#}, 
where  Si  is  a specification  from  LB,  e is  a 
quantifier  free  formula  from  QFFB,  and 
plf  qt  are  the  pre-  and  post-conditions 
associated  with  Sx,  then 

Th (D  hS  {p'}  W {q'} 
if  the  following  hold: 

(i)  W is  while  e do  Wt  od  for  some 

6 Lw- 

(ii)  Th  (I)  H (P'l  W {q!} 

(iii)  Th (I)  |-Sl  (Pi)  Wj  {qj. 

Lemma:  Let  W € L&,  S € L|,  and  let  p',  q'  be  the 
pre-  and  post-conditions  associated  with  S.  If 
Th(I)  HS  (P'}  W (q'}, 

then  W is  partially  correct  with  respect  to  the 
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specification  S. 

Proofs  This  is  an  immediate  consequence  of  the 
preceding  definition,  the  definition  of  correctness 
with  respect  to  specifications,  and  the  lemma  on 
derivations  from  a theory  and  partial  correctness. 

Note  that  in  the  case  that  S is  the  simple 
specification, 

{p}  (q}> 

Th(i')  |— s {p}  W {q},  reduces  to  Th(2)  f—  {p}  W 

{q}- 

Just  as  the  notion  of  partial  correctness  with 
respect  to  specifications  is  an  extension  of  the 
notion  of  partial  correctness  with  respect  to  formu- 
las, the  notion  of  a deduction  from  a theory  of  an 
interpretation  to  a Hoare  formula  consistent  with  a 
specification  is  an  extension  of  the  notion  of  a 
deduction  from  a theory  of  an  interpretation  to  a 
Hoare  formula.  From  the  preceding  lemma,  we 
have  the  connection  between  derivations  consistent 
with  specifications  and  partial  correctness  of 
while-programs  with  respect  to  specifications.  We 
use  the  next  theorem  in  the  construction  of 
abstract  programs  from  specifications. 

Theorem:  Let  S E Lf » and  let  p',  q'  be  the  pre- 
and  post-conditions  associated  with  5.  If  C is 
{ W e L&  l Th(I)  h-s  {p'}  W {q'}  }, 
then  (5,  C)  is  an  abstract  program. 

Proof:  We  need  to  show  that  for  each  W £ C,  W 
is  partially  correct  with  respect  to  S.  This  follows 
from  the  preceding  lemma. 

6.  A Special  Case  of  a Correct  Development 
Step 

We  consider  a somewhat  simplified  situation 
in  which  we  wish  to  construct  a correct  develop- 
ment step.  This  is  actually  part  of  the  basis  step 
for  an  induction  proof  that  the  example  does  have 
the  properties  of  the  abstract  model.  We  start 
with  an  abstract  program  A = (5,  C)  for  which  5 
has  the  form, 

{p}  {q}. 

where  p and  q are  formulas  from  WFFg;  that  is,  S 
is  a simple  specification.  C is  the  set  of  while- 
programs,  W £ L$»  for  which  there  exists  a deduc- 
tion in  the  Hoare  calculus  from  the  theory  of  the 
interpretation  of  the  predicate  logic  to  the  Hoare 
formula  {p}  W {q}  consistent  with  S;  that  is, 

C = {WeL&IThOH5{p}W{q}  }. 

From  the  abstract  program,  (5,  C),  we  construct  a 
new  abstract  program, 

(S',  C), 


in  which  the  specification,  S1,  and  the  set  of  while- 
programs,  C#,  are  related  to  S and  C.  The  rela- 
tionship involves  the  transformation  of  S by 
changing  the  simple  specification  into  another 
specification.  Using  the  notation  of  the  abstract 
model,  we  have  a transformation  on  the 
specifications, 

T:  $ — * S'. 

In  terms  of  the  example  of  the  formal  development 
the  transformation  can  be  expressed  as 

T:  {p}  {q}  — {p}  Si  {q} 

where  Sx  E L®  is  either  an  assignment  statement 
specification,  composed  statement  specification, 
conditional  statement  specification,  or  a while 
statement  specification.  We  give  a formal 
definition  of  these  transformations  in  this  section. 

Let  S'  be  {p}  Si  { q} . C'  is  a set  of  while- 
programs  for  which  there  exists  a deduction  in  the 
Hoare  calculus  from  the  theory  of  the  interpreta- 
tion of  the  predicate  logic  to  the  Hoare  formula 
{pi  W {q}  consistent  with  S';  that  is,  C'  is 
{ W€L^!Th(i)h-s’{p[  W [ q}  }. 

We  assume  that  both  C and  C'  ^ 0.  This  is  an 
assumption  that  there  exist  while-programs  which 
satisfy  the  specifications  S and  S'.  Since  we  are 
constructing  an  example  of  an  idealized  develop- 
ment, these  assumptions  are  reasonable  restrictions 
on  the  specifications.  There  are  four  possibilities 
for  C',  depending  upon  the  four  kinds  of  transfor- 
mation from  {p}  {q}  to  {p}  Si  {q}.  In  this  section 
we  will  introduce  conditions  under  which  it  is  pos- 
sible to  guarantee  that  a while-program  W E Lyy  is 
in  C f|  C1  for  the  case  that  T is  a composed  state- 
ment transformation.  As  a consequence  of  these 
conditions  being  satisfied,  for  each  such  transfor- 
mation, T,  and  for  each  such  while-program  W,  W 
is  partially  correct  with  respect  to  S'  and  S. 

Definition:  (Specification  Transformations)  A 

transformation,  T,  from  a simple  specification,  S, 
which  is  {p}  {q},  where  p,  q are  formulas  from 
WFFb,  to  another  specification,  S',  which  is  the 
image  under  T,  of  S,  is  defined  as  follows: 

a)  Assignment  statement  transformation  If  x 
is  a variable  from  V,  and  t is  a term  from 
Tb,  then 

T:  {p}  {q}  — {p}  x :=  t {q}. 

b)  Composed  statement  transformation  If  pu 
p2,  <li*  Qz  are  formulas  from  WFFB,  and 
{Pi}  {Qi}  and  {P2}  {<12}  are  specifications, 
then 
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T : {p}  {q}  — {p}  {Pi}  {qj  ; {p^V  {<k\  W}- 

c)  Conditional  statement  transformation  If 

Pi>  p2>  qi.  q2  are  formulas  from  WFFb, 
and  {pi}  {qj  and  {p2}  {q2}  are 

specifications,  and  e is  a quantifier  free  for- 
mula from  QFFg,  then 

T : {p}  {q}  - 

{p}  i/e  then  {pj  {qj  else  {p2}  {q2}  fi  UK 

d)  While  statement  transformation  If  plT  qi 
are  formulas  from  WFFB,  {pt}  {qj  is  a 
specification,  and  e is  a quantifier  free  for- 
mula from  QFFb,  then 

T : {p}  {q}  — ► {p’r  while  e do  {pt}  {q^  od  {q}. 

We  note  that  the  pre-  and  post-conditions 
associated  with  both  S and  S1  are  p and  q.  Thus, 
the  transformation, 

T:  S — S', 

preserves  pre-  and  post-conditions. 

The  lemma  which  follows  gives  conditions 
under  which  it  is  possible  to  have  a derivation  of  a 
specific  kind  of  Hoare  formula.  This  Hoare  formula 
is  closely  related  to  the  composed  statement 
specification  transformation.  We  call  these  condi- 
tions proof  rules , since  they  are  sufficient  to 
guarantee  the  existence  of  derivations  in  the  Hoare 
calculus  which  will  lead  to  a correct  development 
step.  In  [2]  proofs  for  the  other  three  kinds  of 
specification  transformation  are  given.  Because  of 
the  way  in  which  specifications  are  defined,  these 
transformations  are  very  similar  to  program 
transformations.  See  [19]  for  a general  survey  of 
program  transformations. 

Lemma:  (Composed  Statement  Derivation)  Let 

T:  $ — ► $*  be  a composed  statement  transforma- 
tion, 

T:  {p}  {q}  —*  {p}  {Pi}  {qi}  ;{p2}  {q*}  {q}- 
Let  W £ Lyy*  Suppose  that  W is 
Wt  ; W2 

for  some  Wlf  W2  £ L$.  Let  p,  plf  p2,  q,  qt,  q2  be 
formulas  from  WFFb,  and  {p}  {q},  {px}  {qj,  and 
{p2}  {qj}  be  specifications  from  Lf . Furthermore, 
assume  that  there  exists  a derivation  of  the  follow- 
ing formulas  from  the  theory  of  the  interpretation 

I: 

a)  P =^>  Pi 


b)  qi  =>  P2 

c)  q2  =>  q 

d)  {pj  {qj  for  some  6 L$r 

e)  {P2}  W2  U2}  f°r  some  ^2  S L^r. 

Then  W 6 C H C'. 

Proof:  As  a consequence  of  a)  - e)  there  exists  the 
following  deduction  in  the  Hoare  calculus: 

Th(I)H  {p}  W,  ; W2  {q}. 

It  follows  that  W E C. 

Let  5i  be  {pj  {qj  and  S2  be  {p2}  {q2}.  If 
the  following  hold 

i)  W is  Wt  ; W2  for  some  Wu  W2  E L® 

ii)  Th (I)  H {p}  W {q} 

iii)  Th(I)  \—Sl  {Pu1  Wt  {qt} 

iv)  Th (2)  (— S*  lp2}  W2  {q2} 

then 

Th  (I)  {p}  W {q} 

and  W E CL  Condition  i)  holds  by  assumption. 
Condition  ii)  is  a consequence  of  a)  - e).  Condition 
iii)  follows  from  d)  and  the  fact  that 
Th(J)  HSl  (Pi)  Wt  {qt} 

Th (J)  1—  {Pl}  W,  {qj. 

Condition  iv)  follows  from  e)  and  the  fact  that 

Th(/)  1 — 2 {p2}  w2  {q2} 
is 

Th(J)  | — {p2}  W2  {qj2} - 

In  this  section  we  have  shown  that  the 
existence  of  a simple  specification  transformation 
and  the  satisfaction  of  the  proof  rules  implies  that 
WEC'n  c.  In  [2]  it  is  proved  that  given  W E C, 
a transformation  of  a structured  specification  and 
the  satisfaction  of  the  proof  rules  then  W E C#. 
This  gives  an  explicit  method  for  going  from  the 
higher  level  specification  in  a development  step  to 
the  lower  level  (more  detailed)  specification  in  a 
development  step.  The  fact  that  C1  C C follows 
from  the  existence  of  a specification  transformation 
from  S to  The  proof  appears  in  [2]. 

7.  Conclusion 

We  have  presented  an  example  of  a stepwise 
development  methodology  and  have  outlined  the 
proof  that  it  has  the  properties  of  an  abstract 
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model  for  stepwise  development.  We  have  also 
given  some  details  of  the  approach  used  to  prove 
that  the  example  does  have  the  properties  of  the 
model.  Section  6 contains  the  proofs  for  only  one 
of  four  cases  needed  in  the  basis  for  an  induction 
proof  of  the  correctness  of  a development  step.  We 
have  not  even  considered  the  induction  step  for  the 
proof  of  the  correctness  of  a development  step  in 
this  paper  although  in  [2]  a complete  proof  is 
presented.  In  order  to  apply  the  methodology  we 
do  need  a completeness  result.  If  we  use  an  expres- 
sive interpretation  for  the  Hoare  logic  [9],  [18], 
then  we  have  the  required  completeness.  The 
expressive  interpretations  are  basically  the  finite 
interpretation  and  the  interpretation  of  the  usual 
arithmetic  of  nonnegative  integers.  The  existence 
of  expressive  interpretations  is  considered  in  [17] 
and  discussed  in  [8]. 

We  are  primarily  interested  in  using  the 
technique  of  an  abstract  model  as  an  aid  in  con- 
structing and  reasoning  about  stepwise  develop- 
ment methods.  The  example  we  have  given  shows 
that  even  for  the  simple  model  we  introduced 
rather  deep  results  concerning  the  deductive  sys- 
tem (such  as  the  existence  of  expressive  interpreta- 
tions in  Hoare  logic  in  the  example  presented)  may 
be  needed  to  prove  a methodology  has  the  proper- 
ties of  the  model. 
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Abstract 

We  present  an  abstract  model  for  the  stepwise  development  of  programs.  The  model 
describes  an  idealized  development  which  is  independent  of  specification  method  and  no- 
tion of  correctness.  We  prove  several  results  about  the  abstract  model.  These  results 
show  that  the  model  possesses  many  of  the  properties  that  we  would  expect  of  an  ideal- 
ized stepwise  development. 

1.  Introduction 

Many  approaches  have  been  suggested  to  ease  the  difficulty  of  the  task  of  producing  software  com- 
ponents which  satisfy  given  specifications;  for  example,  in  (4)  the  problem  of  designing  an  algorithm  which 
meets  a specification  is  discussed.  In  [8]  an  axiomatic  approach  to  correctness  proofs  for  programs  is  given, 
while  in  [11],  [5]  stepwise  approaches  to  program  development  are  presented.  The  Vienna  Development 
Method  (VDM)  [7]  combines  the  notions  of  stepwise  development  and  correctness  proofs.  In  [8]  a stepwise 
approach  to  software  design  methodology,  which  incorporates  a notion  of  program  correctness  with  respect 
to  a specification,  is  discussed. 

The  stepwise  development  of  programs  which  satisfy  given  specifications  has  been  presented  as  a 
methodology.  In  order  to  reason  about  a particular  design  methodology,  to  compare  different  design 
methodologies,  or  to  construct  new  design  methodologies,  it  would  be  valuable  to  have  a abstract 
mathematical  model  for  the  design  process  of  developing  verified  software  components  by  using  a stepwise 
development  method.  The  purpose  of  this  paper  is  to  construct  such  a model  and  to  show  that  the  model 
possesses  the  properties  that  one  would  intuitively  expect  of  an  idealised  stepwise  development  method. 
The  model  does  not  merely  provide  a unifying  conceptual  foundation  for  stepwise  program  development 
methods,  but  the  properties  of  the  model  give  insight  into  the  stepwise  development  methods  described  by 
the  model.  For  example,  implicit  assumptions  contained  within  certain  stepwise  development  methods  are 
actually  theorems  which  are  true  for  the  abstract  model  or  consequences  of  definitions  which  are  used  to 
construct  the  abstract  model.  In  the  consideration  of  the  example  of  a software  component  development 
method,  it  becomes  clear  exactly  what  properties  the  example  must  have  in  order  to  agree  with  the  model. 
This  is  extremely  useful,  since  a stepwise  development  method  which  does  not  have  these  properties  cannot 
be  guaranteed  to  behave  like  the  abstract  model. 

2.  The  Abstract  Model 

We  give  basic  definitions  and  results  which  we  use  to  construct  the  abstract  model  in  section  2.1.  In 
section  2.2  we  introduce  definitions  of  an  abstract  program  and  three  classes  of  developments.  In  section 
2.3  we  obtain  some  results  about  these  classes  of  developments.  In  section  2.4  we  introduce  definitions  of 
classes  of  development  steps  and  we  show  how  developments  can  be  extended  with  development  steps  in 
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section  2.5.  In  section  2.8  we  show  that  developments  can  be  constructed  from  development  steps,  so  that 
development  steps  can  be  viewed  as  “building  blocks”  for  developments. 


2.1.  Foundation  for  the  Abstract  Model 

In  this  section  we  present  some  basic  definitions  and  a theorem  which  we  use  to  construct  the  model. 
Most  of  these  appear  in  [10].  We  mention  the  notion  of  an  implementation  being  correct  with  respect  to  a 
specification.  For  the  purposes  of  constructing  the  abstract  model  and  investigating  the  properties  of  the 
abstract  model  we  do  not  explicitly  define  correctness  with  respect  to  a specification.  Also  we  do  not  pre- 
cisely define  what  we  mean  by  a specification  for  a software  component.  The  basic  idea  is  to  investigate 
the  properties  of  an  abstract  model  for  stepwise  development  which  are  independent  of  the  specification 
method  and  notion  of  correctness.  The  purpose  of  the  abstract  model  is  to  provide  a framework  to  study 
an  incremental  development  method  for  a particular  approach  to  specification  and  a particular  notion  of 
correctness.  This  enables  us  to  distinguish  between  those  properties  which  are  characteristic  of  an  incre- 
mental development  and  those  properties  which  are  intrinsic  to  a specific  incremental  development 
method. 


2.1.1.  Notation: 

Let 

Bool  = { true,  false  } 

SPEC  = { S I S is  a specification  } 

IMPL  = { p I p is  an  implementation  }. 

2.1.2.  Definition:  Let  f:  SPEC  X IMPL  — ► Bool  be  defined  as  follows: 


f (S,  p)  = 


true  if  p is  correct  with  respect  to  S 
false  otherwise. 


2*1.3.  Definition:  Let  5 be  an  element  of  SPEC.  Let  C = { p G IMPL  I f(S,  p)  = true  }. 

It  may  be  that  C = 0,  that  is,  5 is  a specification  for  which  there  exists  no  correct  implementation. 
This  can  occur  if,  for  example,  the  specification  S is  inconsistent. 

2.1.4.  Definition:  A partial  order  is  a pair  (P,  R)  where  P is  a set  and  R is  a relation  on  P which  is 

reflexive  (for  all  a G P,  aRa),  antisymmetric  (for  all  a,  b G P,  aRb  and  bRa  implies  that  a = b),  and  tran- 
sitive (for  all  a,  b,  c £ P,  aRb  and  bRc  implies  that  aRc). 

2.1.5.  Definition:  Let  (P,  R)  be  a partial  order  and  S a nonempty  subset  of  P.  S is  called  a chain  if 

aRb  or  bRa  (or  both)  holds  for  all  a,  b G S.  This  simply  means  that  the  relation  “R”  restricted  to  S is 
total. 

2.1.6.  Definition:  Let  (P,  R)  be  a partial  order  and  S be  a subset  of  P.  Let  a G S.  The  element  a is  a 
least  element  of  S if,  for  all  b G S,  aRb. 

2.1.7.  Definition:  Let  (P,  R)  be  a partial  order  and  S a subset  of  P.  An  element  a of  P is  an  upper 

bound  of  S (in  P)  if  bRa  for  all  b G S.  An  element  a of  P is  the  least  upper  bound  (lub)  of  S if  a is  the  least 
element  of  the  set  of  upper  bounds  of  S in  P. 

2.1.8.  Definition:  A partial  order  (P,  R)  is  a complete  partial  order,  denoted  by  epo,  if  the  following 

two  conditions  hold: 

(1)  The  set  P has  a least  element. 

(2)  For  every  chain  S in  P the  least  upper  bound,  lub  S exists. 

2.1.9.  Theorem:  Every  partial  order  which  contains  a least  element  and  contains  only  finite  chains  is  a 

cpo. 

2.1.10.  Notation:  Let  S be  a set.  By  P(S)  we  mean  the  power  set  of  S. 

2.1.11.  Lemma:  If  Cj  is  a set  of  implementations  which  are  correct  with  respect  to  the  specification  S* 
for  some  integer  i,  i > 0,  then  the  ordered  pair  {P( CJ,  C)  is  a partial  order. 

Proof:  This  follows  from  the  fact  that  the  relation  “C”  is  reflexive,  antisymmetric,  and  transitive. 
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2.2.  Claasca  of  Developments 

In  this  section  we  introduce  definitions  of  an  abstract  program,  a development  with  respect  to  a 
specification,  a correct  development,  a complete  development,  and  an  incomplete  development.  The  for- 
mal definitions  concerning  a development  and  the  various  classifications  6f  developments  correspond  to  the 
intuitive  notions  of  development  and  kinds  of  developments  which  occur  in  a stepwise  development  pro- 
cess. 

2.2.1.  Definition*  An  abetract  program  A is  an  ordered  pair,  (5,  C),  such  that  the  first  member  of  the 
pair,  5,  is  a specification,  and  the  second  member  of  the  pair,  C,  is  a set  of  implementations  which  are 
correct  with  respect  to  S. 

2.2.2.  Definition!  A development  with  respect  to  a specification  S0  is  an  (n  + l)-tuple  of  abstract  pro- 
grams, (AQj  Au  >ln),  for  some  nonnegative  integer  n such  that  for  each  i,  0 < i < n,  Ax  = (Sv  CJ. 

In  the  discussion  of  developments,  we  shall  usually  write  out  the  abstract  programs  explicitly  as 
ordered  pairs  of  specifications  and  sets  of  implementations,  since  it  is  the  interaction  of  the  components  of 
the  ordered  pair  rather  than  the  abstract  programs  themselves  which  are  of  interest. 

2.2.3.  Notation!  Let  S be  a set.  By  Islwe  mean  the  cardinality  of  S. 

2.2.4.  Definition!  A development  with  respect  to  a specification  50,  ((50,  C0),  (Sl?  CJ,  ...,  ($n,  Cn))  is 
correct  if  Ci+l  C Ct,  0 < i < n. 

It  is  possible  that  a correct  development  with  respect  to  a specification,  $0,  will  have  a set  of  imple- 
mentations, Cj,  for  which  Cj  = 0.  Then  all  Cj,  for  j > i,  will  also  be  equal  to  the  empty  set.  These  kinds 
of  developments  will  not  be  of  interest  in  themselves,  since  they  do  not  lead  to  an  implementation  which  is 
correct  with  respect  to  the  original  specification,  5q.  What  is  needed  is  an  additional  property  for  develop- 
ments, which  will  ensure  that  the  sets  of  implementations  associated  with  the  specifications  in  the  develop- 
ment will  all  be  nonempty.  There  are  two  properties  of  developments  which  enable  us  to  describe  the 
kinds  of  developments  that  we  wish  to  consider.  These  two  properties  are  independent  of  the  correctness 
of  a development,  but  will  be  used  only  in  conjunction  with  correct  developments.  A correct  development, 
which  is  complete , is  a development  ending  in  a single  implementation  which  is  correct  with  respect  to  the 
specification  with  which  it  is  associated.  A correct  development,  which  is  incomplete , is  a development 
which  may  extended  (in  a sense  which  will  be  made  precise  later)  to  form  a correct  and  complete  develop- 
ment. We  define  the  notions  of  a complete  development  and  an  incomplete  development  more  precisely. 

2.2.5.  Definition!  A development  with  respect  to  a specification  So,  ({So,  C0),  ($u  CJ,  ....  (Sn,  CJ)  is 
complete  if  IcJ  = 1. 

2.2.6.  Definition!  A development  with  respect  to  a specification  50,  (($0»  C„),  (5,,  CJ,  (Sa,  CJ)  is 
incomplete  if  IcJ  > 1. 

2.3.  Properties  of  Classes  of  Developments 

In  this  section  we  show  that  the  sets  of  implementations  associated  with  correct  developments  and 
correct  and  complete  developments  form  nonempty,  finite,  nested  families.  This  leads  to  theorem  2.4.2 
which  states  that  an  implementation,  which  is  correct  with  respect  to  a specification,  is  also  correct  with 
respect  to  all  preceding  specifications  in  a development.  The  last  result  of  this  section  states  that  the  sets 
of  implementations  associated  with  a correct  and  complete  development  form  a epo  when  ordered  by  set 
inclusion. 

2.3.1.  Theorem!  Let  D be  a correct  and  complete  development,  (($„,  CJ,  (51(  CJ,  ...,  (S„,  CJ),  with 
respect  to  the  specification  S0.  It  follows  that: 

(1)  for  each  integer  i,  0 < i < n,  Cj  is  a subset  of  the  set  of  all  implementations  which  are  correct  with 
respect  to  the  specification  Sj 

(2)  Cl+1  C C„  0 < i < n 

(3)  ICJ  - 1. 

Proof:  Property  (1)  follows  from  the  fact  that  D is  a development.  Property  (2)  follows  from  the  fact 
that  D is  correct,  and  property  (3)  holds  because  D is  complete. 
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As  a direct  consequence  of  the  preceding  theorem,  if  D is  a correct  and  complete  development  or  a 
correct  and  incomplete  development  then  it  follows  that  for  each  i,  0 < i < n,  C{  ^ 0. 

2.3.2.  Theoremt  Let  (( S0 , C0),  (Su  CJ,  ...»  (5n,  Cn))  be  a correct  development  with  respect  to  the 
specification  $0.  If  p £ C{  for  some  integer  i,  0 < i < n,  then  p is  correct  with  respect  to  all  specifications 
Sj,  0 < j < i. 

Proof:  If  p £ Cj  for  some  integer  i,  0 < i < n,  then  p £ Cj  for  all  integers  j,  0 < j < i from  the  definition 
of  a correct  development. 

2.3.3.  Lemma:  Let  (($0,  C0),  (5lf  CJ,  ...,  ($n,  Cn))  be  a correct  development  with  respect  to  a 

specification  $0.  Let  S = {C0,  Cu  ...»  Cn}.  Then  S is  a finite  chain  in  (^(C0),  C). 

Proof:  Clearly,  S is  finite  and  the  order  relation  “C”  restricted  to  the  set  S is  total. 

2.3.4.  Definition:  We  call  the  set  S a finite  chain  associated  with  the  correct  development . 

2.3.5.  Notation:  Let  DV  be  the  union  of  {0}  with  the  collection  of  all  finite  chains  associated  with  all 
correct  and  complete  developments  with  respect  to  a specification  50. 

2.3.0.  Theorem*  The  ordered  pair  (DV,  C)  is  a cpo. 

Proof:  By  lemma  2.1.11,  (P(C0),  C)  is  a partial  order.  It  follows  that  (DV,  C)  is  a partial  order.  The 
least  element  of  DV  is  0.  By  the  preceding  lemma  and  the  fact  that  any  correct  and  complete  develop- 
ment with  respect  to  the  specification  S0  is  also  a finite  chain  in  (DV,  C),  (DV,  C)  contains  only  finite 
chains.  By  theorem  2.1.9,  (DV,  C)  is  a cpo. 

2.4.  Classes  of  Development  Steps 

In  this  section  we  introduce  the  notion  of  a development  step  and  several  classifications  of  develop- 
ment steps.  We  classify  development  steps  as  correct,  incomplete,  and  complete.  Correct  development 
steps  are  those  development  steps  that  have  properties  which  make  these  steps  suitable  for  use  in  con- 
structing correct  developments.  Incomplete  development  steps  are  used  in  constructing  all  but  the  final 
step  in  a development,  while  a complete  development  step  is  used  as  the  final  step  in  the  construction  of  a 
development.  The  notion  of  a development  step  is  fundamental,  since  it  is  the  concept  which  describes  the 
result  of  a process  of  going  from  one  abstract  program  to  another. 

2.4.1.  Definition:  A development  step  with  respect  to  a specification  is  an  ordered  pair  of  abstract 

programs,  (A{,  4+1),  such  that  A\  = ( S{ , Cj)  and  Ai+l  = (5i+l,  Ci+1). 

For  the  same  reasons  presented  in  the  discussion  of  developments,  in  the  discussion  of  development 
steps,  we  shall  usually  write  out  the  abstract  programs  explicitly  as  ordered  pairs  of  specifications  and  sets 
of  implementations. 

2.4.2.  Definition:  Let  (($0,  C0),  ($u  CJ,  ...,  (Sn,  CJ)  be  a development  with  respect  to  a specification 

50.  Let  ((Sjt  Cj),  (5j+1,  Cj+1))  be  a development  step  with  respect  to  the  specification  Sy  The  development 
contains  the  development  step  if  j = i for  some  integer  i,  0 < i < n - 1,  that  is,  the  development  step  is 
(($i»  ci)>  ($i+i>  Ci+i))i  where  (5j,  Cj)  and  (Si+lJ  Cl+1)  are  successive  members  of  the  (n  + l)-tuple  which  is 
the  development  with  respect  to  the  specification  50* 

2.4.3.  Definition:  A development  step  with  respect  to  a specification  S{  for  some  nonnegative  integer  i, 
({Si , Cj),  ($i+1,  Cl+1)),  is  correct  if  the  following  hold: 

(1)  Cj,  Ci+l#7*  0 

(2)  CM  C Cj. 

2.4.4.  Definition:  A development  step  with  respect  to  a specification  $j,  (($j,  Cj),($i+1,  Ci+1)),  is  com- 
plete  iflCjJ=l. 

2.4.6.  Definition:  A development  step  with  respect  to  a specification  $j,  ((5j,  Cj),(5i+l,  Ci+i)),  is  incom- 
plete if  \CM\  > l. 

2.5.  The  Extension  of  Developments  with  Development  Steps 

The  results  in  this  section  show  that  developments  can  be  extended  by  development  steps  to  form 
new  developments.  The  resulting  new  developments  have  properties  which  depend  upon  the  original 


4 


developments  and  the  development  steps. 

2.6.1.  Lemma,  Let  D be  a development,  ((S0,  C0),  (Su  CJ,  (<>„,  CJ),  with  respect  to  the  specification 
i0.  suppose  that  ((•>„,  CB),  (5n+1,  Cn+J)  is  a development  step  with  respect  to  SB.  Let  Dt  be  the  ordered  (n 

■+■  2)-tuple,  ((Sg,  C0),  (Su  CJ,  ($„+,,  Cn+J).  Then  D,  is  a development  with  respect  to  the  specification 
S0,  which  contains  the  given  development  step. 

Proof,  The  ordered  (n  + 2)-tuple,  D„  is  a development  with  respect  to  the  specification,  S0,  since  it  can 
be  shown  that 

(1)  for  each  i,  0 < i < n,  Cj  is  the  set  of  all  implementations  which  are  correct  with  respect  to  the 
specification  S\ 

(2)  Cn+l  is  the  set  of  all  implementations  which  are  correct  with  respect  to  the  specification  $n+l. 
Property  (1)  follows  from  the  assumption  that  D is  a development,  while  property  (2)  follows  from  the 
assumption  that  (($n,  Cn),  (<Sn+l,  Cn+J)  is  a development  step  with  respect  to  SB.  The  development  Di  is 
clearly  a development  which  contains  the  given  development  step,  (($„,  Cn),  (Sa+U  Cn+1)). 

2.6.2.  Lemma,  Let  D be  a correct  development,  (($„,  C0),  ($„  CJ,  ....  ($n,  CJ),  with  respect  to  the 
specification  S0.  Suppose  that  (( SB , CJ,  (Sn+1,  ^a+i))  i*  a correct  development  step  with  respect  to  Sn. 
Then  D„  the  ordered  (n  + 2)-tuple.  (($„,  C0),  (S„  CJ,  ....  (Sn+1,  Cn+J),  is  a correct  development  with 
respect  to  the  specification  S0y  which  contains  the  given  development  step. 

Proof,  From  the  preceding  lemma,  D,  is  a development  with  respect  to  the  specification  S0  which  con- 
tains the  given  development  step.  is  a correct  development  since  it  can  be  shown  that 

(1)  Ci+1  C Ci,  0 < i < n 

(2)  Cn  C Cn+1. 

Property  (l)  follows  from  the  assumption  that  D is  correct  and  property  (2)  follows  from  the  assumption 
that  the  development  step,  (($„,  CJ,  ($B+j,  Cn+J),  is  a correct  development  step  with  respect  to  $n. 

2.6.6.  Theorem,  Let  D be  a correct  development,  (($„,  C0),  (S„  CJ,  ...,  (Sa,  CJ),  with  respect  to  the 
specification  S0.  Suppose  that  (($„,  CJ,  (Sa+l, 

^n+i))  i*  a complete  and  correct  development  step  with 
respect  to  Sn.  Let  Dt  be  (($o»  C0)»  (^i»  C*),  ...,  (5n+i,  Cn+1)).  is  a correct  and  complete  development 
with  respect  to  the  specification  $0,  which  contains  the  given  development  step. 

Proof:  From  the  preceding  lemma,  Dt  is  a correct  development  with  respect  to  the  specification  S0  which 
contains  the  given  development  step.  Because  ((Sn,  CB),  ($n+1,  Cn+l))  is  a complete  development  step  with 
respect  to  Sn,  it  follows  that  ICn+1|  = 1.  This  shows  that  the  development  is  complete. 

2.6.4.  Corollary,  Let  D be  a correct  and  incomplete  development,  ((50,  CJ,  ($„  CJ,  ...,  (SB,  CJ),  with 
respect  to  the  specification  S0.  Suppose  that  (($„,  CJ,  (Sa+lt  Cn+J)  is  a complete  and  correct  development 

step  with  respect  to  $n.  Let  D,  be  ((50,  C0),  (Slt  CJ (Sn+„  Cn+J).  D,  is  a correct  and  complete 

development  with  respect  to  the  specification  50,  which  contains  the  given  development  step. 

2.6.  The  Construction  of  Developments  from  Development  Steps 

In  this  section  we  show  that  developments  can  be  constructed  from  development  steps.  The  proper- 
ties of  the  resulting  developments  are  dependent  upon  the  properties  of  the  development  steps  used  in  the 
construction  of  the  developments. 

2.6.1.  Lemma.  Let  ((S0,  CJ,  ($u  CJ),  (($„  CJ,  ($2,  CJ),  ...,  (($„_„  C„_J,  (Sn,  CJ)  be  a collection  of  n 

correct  development  steps  with  respect  to  the  specifications  S0f  5„  respectively,  for  some  positive 

integer  n.  Let  D = (($0,  C0),  (Slt  CJ,  ...,  (5n,  CB)).  Then  D is  a correct  development  with  respect  to  the 
specification  S0. 

Proof,  D is  a development  with  respect  to  the  specification  SQ  from  the  definition  of  a development  step. 
Since  ((Sj,  CJ,  5i+l,  Ci+J)  is  a correct  development  step  with  respect  to  the  specification  St  for  each  integer 
ii  0 < i < n,  Ci+1  C Cj.  It  follows  that  D is  a correct  development. 

2.6.2.  Theorem,  Let  (($„,  CJ,  (S„  C J),  ((5,,  C J,  (Sit  CJ),  ...,  ((5B_„  CB_J,  (Sa,  Cn))  be  a coUection  of 
n correct  development  steps  with  respect  to  the  specifications  $0,  $„  ...,  Sa  respectively,  for  some  positive 
integer  n.  Furthermore,  suppose  that  ((5B_lf  CB_J,  ($at  CJ)  is  a complete  development  step.  Let  D = 

((^o>  co)t  (5|i  Ct),  ...,  (5n,  Cn)).  Then  D is  a correct  and  complete  development  with  respect  to  the 
specification  50. 
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Proof*  From  the  preceding  lemma,  D is  a correct  development  with  respect  to  the  specification  S0.  Since 
(($*_!,  Cn^l),  (5n,  CJ)  is  a complete  development  step,  Icj  = 1.  It  follows  that  D is  a complete  develop- 
ment. 

3.  Conclusion 

The  abstract  model  describes  the  process  of  starting  with  a specification  for  a software  component 
and,  through  a series  of  steps,  ending  with  an  implementation  which  is  correct  with  respect  to  the  original 
specification.  The  model  has  been  used  to  construct  an  example  of  a stepwise  development  method  in  [l] 
This  example  uses  specifications  in  terms  of  pre-  and  post-conditions  [10].  The  notion  of  correctness 
which  is  used  is  an  extension  of  partial  correctness  with  respect  to  formulas  from  predicate  logic.  In  the 
construction  of  the  example,  an  extension  of  the  Hoare  calculus  from  a program  verification  method  to  a 
stepwise  program  development  method  was  required.  Conceptually,  this  extension  was  fairly  easy  to  make 
because  of  the  existence  of  the  abstract  model.  On  the  other  hand,  the  proof  that  an  example  of  a stepwise 
development  method  actually  does  satisfy  the  requirements  of  the  abstract  model  may  depend  upon  non- 
trivial properties  of  the  example.  For  the  example  that  we  have  studied,  the  proof  depends  upon  the  rela- 
tive completeness  of  the  Hoare  calculus  [3]  and  the  existence  of  expressive  interpretations  for  the  Hoare 
logic  [10],  [2],  [9]. 

Acknowledgement  The  author  would  like  to  acknowledge  many  helpful  discussions  with  Bob  Terwil- 

liger. 


4*  References 

1.  Bentinger,  Lee  A.  "Toward  a Theory  of  the  Stepwise  Development  of  Programs",  In  preparation,  Dept,  of  Com- 

puter Science,  University  of  Illinois,  1986, 

2.  Clarke,  E.  M.,  Jr.,  S.  M.  German  and  J.  Y.  Halpern.  On  Effective  Aziomatizations  of  Hoare  Logics.  Proceedings 

of  the  9th  ACM  Symposium  on  Principles  of  Programming  Languages  (January  1982)  pp.  309-321 

3.  Cook,  S.  A.  Soundness  and  Completeness  of  an  Axiom  System  for  Program  Verification . SIAM  Journal  of 

Computing  (February  1978)  vol.  7,  no.  1,  pp.  70-90. 

4.  Dijkstra,  Edsger  W.  A Constructive  Approach  to  the  Problem  of  Program  Correctness.  BIT  (1968)  pp.  174-186. 

5.  Gries,  David.  The  Science  of  Programming.  Springer-Verlag,  New  York,  1981. 

6.  Hoare,  C.  A.  R.  An  Axiomatic  Basis  for  Computer  Programming . Communications  of  the  ACM  (October 

1969)  vol.  12,  no.  10,  pp.  576-580. 

7.  Jones,  Cliff  B.  Software  Developments  A Rigorous  Approach.  Prentice-Hall  International,  Engel  wood 

Cliffs,  N.J.,  1980. 

8.  Lehman,  M.  M.,  V.  Stenning  and  W.  M.  Turski.  Another  Look  at  Software  Design  Methodology.  Software 

Engineering  Notes  (April  1984)  vol.  9,  no.  2,  pp.  33-53. 

9.  Lipton,  R.  J.  A Necessary  and  Sufficient  Condition  for  the  Existence  of  Hoare  Logics.  Proceedings  of  the  18th 

IEEE  Symposium  on  Foundations  of  Computer  Science  (October  1977)  pp.  1-6. 

10.  Loeckx,  Jacques  and  Kurt  Sieber.  The  Foundations  of  Program  Verification.  John  Wiley  & Sons,  New 

York,  1984. 

11.  Wirth,  Niklaus.  Program  Development  by  Stepwise  Refinement.  Communications  of  the  ACM  (April  1971) 

vol.  14,  no.  4,  pp.  221-227. 


6 


SAGA  Project  Mid-Year  Report  1986 


Appendix  H 


Toward  a Theory  of  the  Stepwise  Development  of  Programs 


Lee  A.  Benzinger 


Department  of  Computer  Science 
University  of  Illinois  at  Urbana-Champaign 


Urbana,  Illinois 


September  16,  1986 


DRAFT 


Toward  a Theory  of  the 
Stepwise  Development  of  Programs 


PAPER 


Lee  A.  Benzinger 


Department  of  Computer  Science 
University  of  Illinois  at  Urbana-Champaign 
1304  W.  Springfield  Ave. 

Urbana,  Illinois  61801 
217-333-8426 


Abstract 

We  present  an  abstract  model  for  the  stepwise  development  of  programs.  The 
model  describes  an  idealized  development  which  is  independent  of  specification 
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method  and  notion  of  correctness.  We  illustrate  the  abstract  model  with  an  ex- 
ample of  a program  development  method  using  Hoare  calculus  and  partial 
correctness  of  programs  with  respect  to  specifications  which  are  in  terms  of  pre- 
and  post-conditions. 


This  research  is  supported  in  part  by  NASA  grant  NAG  1-138. 


2 


Table  of  Contents 


1.  Introduction 


2.  The  Abstract  Model  

2.1.  Preliminary  Definitions  

2.2.  The  Construction  of  the  Model  

2.3.  Classes  of  Developments  

2.4.  Properties  of  Classes  of  Developments 

2.5.  Classes  of  Development  Steps ; 

2.6.  The  Extension  of  Developments  with  Development  Steps  .... 

2.7.  The  Construction  of  Developments  from  Development  Steps 


5 

6 
8 
8 

10 

12 

13 

15 


3.  An  Example  of  a Formal  Development  , 

3.1.  Preliminary  Definitions  

3.2.  The  Construction  of  the  Example 

3.3.  The  Hoare  Logic  and  Calculus 


17 

18 
23 
30 


4.  A Development  

4.1.  Derivations  and  Partial  Correctness 

4.2.  The  Construction  of  an  Abstract  Program 

4.3.  The  Construction  of  a Development  


34 

34 

36 

39 


5.  A Correct  Development  Step  

5.1.  Specifications  and  Annotated  Programs  

5.2.  A Special  Case  of  a Correct  Development  Step 

5.3.  Specification  Transformations  

5.4.  The  General  Case  for  Transformation  Proof  Rules 

5.5.  Sets  of  Implementations  Related  by  Set  Inclusion  . 

5.6.  Obtaining  an  Implementation  Using  Proof  Rules  ... 


42 

43 
48 
59 
62 
69 
86 


6.  Conclusions 


101 


7.  References 


102 


July  29,  1986 


DRAFT 


1. 

Introduction 

It  is  a difficult  task  to  develop  a software  component  which  satisfies  a given  specification.  If 
the  specification  is  not  precise,  as  in  the  case  of  a specification  in  terms  of  natural  language,  the 
ambiguities  in  the  specification  can  create  confusion  as  to  the  meaning  of  the  specification  and 
the  intent  of  the  specifier.  The  introduction  of  a formal  specification,  which  uses  well-defined 
notation,  can  eliminate  ambiguities.  A disadvantage  of  such  a formal  specification  is  that  it  may 
be  more  difficult  to  understand,  simply  due  to  notation,  than  a less  formal  specification. 

The  matter  of  showing  that  a software  component  actually  satisfies  a specification  can  be 
accomplished  through  testing  or  verification.  The  testing  approach  does  not,  in  general,  provide  a 
guarantee  that  a software  component  satisfies  a specification,  while  verification  quickly  becomes  a 
formidable  problem  as  the  size  and  complexity  of  a component  increases.  In  addition,  verification 
requires  that  a specification  be  expressed  in  some  formal  manner. 

The  task  of  developing  a software  component  which  satisfies  a given  specification  can  be 
simplified  by  breaking  the  task  into  a series  of  subtasks  or  “steps”.  Associated  with  each  step  is  a 
specification  and  a software  component.  Initially,  the  software  component  is  nothing  more  than 
the  original  specification.  At  each  step,  the  specification  associated  with  the  step  is  more  detailed 
than  the  specifications  associated  with  preceding  steps.  In  addition,  the  specification  associated 
with  a particular  step  is  consistent  with  the  specifications  associated  with  preceding  steps.  The 
software  components  corresponding  to  these  specifications  become  increasingly  more  detailed  as 
the  stepwise  process  proceeds.  At  the  final  step  in  the  process,  the  result  is  a final  specification 
and  a final  software  component.  This  final  specification  is  consistent  with  all  preceding 
specifications,  including  the  original  specification.  The  corresponding  final  software  component  is 
an  implementation  which  not  only  satisfies  the  specification  associated  with  the  final  step,  but 
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also  satisfies  the  original  specification.  This  technique  is  used  by  the  Vienna  Development 
Method  [2]. 

In  the  Vienna  Development  Method  the  processes  of  developing  a software  component  and 
verifying  that  it  actually  satisfies  the  requirements  of  the  original  specification  proceed  hand  in 
hand.  This  has  the  dual  advantage  of  the  development  process  aiding  the  verification  process  and 
the  verification  process,  in  turn,  aiding  the  development  process.  The  development  process, 
because  of  its  incremental  nature,  breaks  the  verification  process  into  smaller  parts,  each  of 
which  is  a piece  of  the  total  problem  of  verifying  that  the  implementation  which  results  from  the 
development  actually  satisfies  the  original  specification.  The  verification  process,  because  of  its 
incremental  nature,  aids  in  the  development  process.  Indeed,  the  part  of  a software  component 
under  development  which  may  not  satisfy  the  requirements  of  its  specification  is  at  most  one  logi- 
cal step  away  from  a component  under  development  which  is  known  to  satisfy  its  specification. 
Each  step  in  the  development  will  either  eventually  yield  an  implementation  which  will  satisfy 
the  original  specification  or  backtracking  occurs  which  itself  eventually  yields  an  implementation 
which  will  satisfy  the  original  specification.  This  approach  can  reduce  the  time  and  cost  of 
developing  reliable  software  since  design  decisions  can  be  checked  for  correctness  in  the  middle  of 
the  development  process  and  can  be  changed  precisely  at  the  point  in  the  development  of  a 
software  component  which  is  affected  by  these  decisions. 

In  order  to  reason  about  a particular  design  approach  or  to  compare  different  design 
methods,  it  would  be  valuable  to  have  an  abstract  model  for  the  design  process  of  developing 
verified  software  components  by  using  a stepwise  development  method.  The  purpose  of  this 
paper  is  to  construct  such  a model,  to  show  that  the  model  possesses  the  properties  that  one 
would  expect  of  a stepwise  development  method,  and  finally,  to  show  an  example  of  a particular 
software  component  development  method  that  falls  within  the  framework  provided  by  the  model. 
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The  model  does  not  merely  provide  a unifying  conceptual  foundation  for  stepwise  program 
development  methods,  but  the  properties  of  the  model  give  insight  into  the  stepwise  development 
methods  described  by  the  model.  For  example,  implicit  assumptions  contained  within  certain 
stepwise  development  methods  are  actually  theorems  which  are  true  for  the  abstract  model  or 
consequences  of  definitions  which  are  used  to  construct  the  abstract  model.  In  the  consideration 
of  the  example  of  a software  component  development  method,  it  becomes  clear  exactly  what  pro- 
perties the  example  must  have  in  order  to  agree  with  the  model.  This  is  extremely  useful,  since  a 
stepwise  development  method  which  does  not  have  these  properties  cannot  be  guaranteed  to 
behave  like  the  abstract  model. 
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2.  The  Abstract  Model 

In  this  section  we  introduce  definitions  which  we  use  to  construct  the  model.  We  also 
obtain  some  results  about  the  model.  Definitions  and  theorems  which  are  standard  are  not  num- 
bered. Most  of  the  standard  definitions  and  theorems  appear  in  [3].  Definitions  introduced  in  the 
construction  of  the  model,  and  theorems,  lemmas,  and  corollaries,  which  we  prove  concerning  the 
model,  are  numbered.  The  definitions  which  we  introduce  include  the  notions  of  an  abstract  pro- 
gram, a development  with  respect  to  a given  specification,  a»correct  development,  a complete 
development,  an  incomplete  development,  a development  step,  a correct  development  step,  and  a 
complete  development  step. 

There  are  five  main  results  in  this  section.  The  first  result  is  Theorem  1 which  gives  three 
properties  of  a correct  and  complete  development.  Starting  with  a specification  and  ending  with 
a verified  implementation,  these  are  properties  that  one  would  intuitively  expect  to  have  in  an 
idealized  development  of  a software  component. 

Given  a specification  for  a software  component,  we  can  associate  with  it  a set  of  implemen- 
tations, which  are  correct  with  respect  to  the  specification.  We  define  another  set,  denoted  by 
DY,  of  subsets  of  the  set  of  implementations.  The  second  result  is  Theorem  2,  that  the  ordered 
pair  (DV,  C),  where  “C”  is  the  set  inclusion  relation  on  the  elements  of  DV,  is  a complete  partial 
order.  This  result  shows  that  (DV,  C)  has  a well-understood  structure,  in  addition  to  the  more 
obvious  “chain  structure”  of  sets  of  implementations  restricted  to  a particular  development. 

The  third  result  is  that  correct  and  complete  developments  with  respect  to  a given 
specification  can  be  obtained  from  a correct  development  followed  by  a correct  and  complete 
development  step.  This  shows  the  relationship  between  a correct  development  and  a new 
development  which  is  obtained  from  the  original  development  by  adding  a correct  and  complete 
development  step.  The  relationship  is  an  immediate  consequence  of  Theorem  4. 
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The  fourth  result  of  this  section  shows  that  the  correctness  of  an  implementation  of  a 
software  component  with  respect  to  the  original  specification  is  maintained  throughout  the 
development  process,  provided  that  the  development  is  correct.  This  is  the  result  of  the  Corol- 
lary to  Theorem  2. 

The  fifth  result  shows  that  development  steps  can  be  viewed  as  “building  blocks”  for 
developments.  We  obtain  correct  and  complete  developments  by  building  them  from  correct 
development  steps  and  a single  correct  and  complete  development  step.  This  is  the  result  of 
Theorem  5. 

2.1.  Preliminary  Definitions 

In  the  following,  we  mention  the  notion  of  an  implementation  being  correct  with  respect  to 
a specification.  For  the  present  we  choose  not  to  consider  issues  which  arise  in  discussions  of 
correctness;  for  example,  partial  versus  total  correctness.  Also,  we  do  not  precisely  define  what 
we  mean  by  a specification  for  a software  component.  We  defer  these  matters  until  later  when 
we  discuss  in  more  detail  a specific  method  of  software  component  specification  and  a specific 
example  of  this  method.  The  basic  idea  is  to  investigate  the  properties  of  an  abstract  model  for 
stepwise  development  which  are  independent  of  the  specification  method  and  notion  of  correct- 
ness. We  then  use  the  abstract  model  to  study  an  incremental  development  for  a particular 
specification  method  and  a particular  notion  of  correctness.  This  enables  us  to  distinguish 
between  those  properties  which  are  characteristic  of  an  incremental  development  and  those  pro- 
perties which  are  intrinsic  to  a specific  incremental  development  method.  We  give  some  basic 
definitions,  most  of  which  are  standard,  which  are  used  in  the  construction  of  the  abstract  model. 
Notation:  Let 

Bool  — { true,  false  } 
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SPEC  = { $ I $ is  a specification  } 

IMPL  = { p I p is  an  implementation  }. 

Definition:  Let  f:  SPEC  X IMPL  — *■  Bool  be  defined  as  follows: 


(true  if  p is  correct  with  respect  to  S 
false  otherwise. 

Definition:  Let  5 be  an  element  of  SPEC.  Let  C = { p £ IMPL  I f (S,  p)  = true  }. 


It  may  be  that  C = 0,  that  is,  S is  a specification  for  which  there  exists  no  correct  imple- 
mentation. This  can  occur  if,  for  example,  the  specification  S is  inconsistent. 

Definition:  A partial  order  is  a pair  (P,  R)  where  P is  a set  and  R is  a relation  on  P which  is 
reflexive  (for  all  a E P,  aRa),  antisymmetric  (for  all  a,  b 6 P,  aRb  and  bRa  implies  that  a = b), 
and  transitive  (for  all  a,  b,  c 6 P,  aRb  and  bRc  implies  that  aRc). 

Definition:  Let  (P,  R)  be  a partial  order  and  S a nonempty  subset  of  P.  S is  called  a chain  if 
aRb  or  bRa  (or  both)  holds  for  all  a,  b € S.  This  simply  means  that  the  relation  “R”  restricted 
to  S is  total. 


Definition:  Let  (P,  R)  be  a partial  order  and  S be  a subset  of  P.  Let  a 6 S.  The  element  a is  a 
least  element  of  S if,  for  all  b £ S,  aRb. 

We  note  that  if  a and  b are  each  least  elements  of  S it  follows  that  aRb  and  bRa.  Since  the 
relation  R is  antisymmetric,  a = b,  so  that  a least  element,  if  it  exists,  is  unique. 

Definition:  Let  (P,  R)  be  a partial  order  and  S a subset  of  P.  An  element  a of  P is  an  upper 
bound  of  S (in  P)  if  bRa  for  all  b € S.  An  element  a of  P is  the  least  upper  bound  (lub)  of  S if  a is 
the  least  element  of  the  set  of  upper  bounds  of  S in  P. 

Definition:  A partial  order  (P,  R)  is  a complete  partial  order,  denoted  by  cpo,  if  the  following 
two  conditions  hold: 
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(1)  The  set  P has  a least  element. 

(2)  For  every  chain  S in  P the  least  upper  bound,  lub  S exists. 

Theorem:  Every  partial  order  which  contains  a least  element  and  contains  only  finite  chains  is 
a cpo. 

2.2.  The  Construction  of  the  Model 

Notation:  Let  S be  a set.  By  P( S)  we  mean  the  power  set  of  S. 

Lemma:  If  Cj  is  a set  of  implementations  which  are  correct  with  respect  to  the  specification 
for  some  integer  i,  i > 0,  then  the  ordered  pair  (£(0*),  C)  is  a partial  order. 

Proof:  The  relation  “C”  is  reflexive  since  a 6 P(C-J  implies  that  a C a.  The  relation  “C"  is 
antisymmetric  since  for  all  a,  b € />((}),  if  a C b and  b C a then  a = b.  The  relation  “C”  is 
transitive,  since  for  all  a,  b,  c 6 P(C;),  if  a C b and  bCc  then  a C c. 

2.3.  Classes  of  Developments 

In  this  section  we  introduce  definitions  of  an  abstract  program,  a development  with  respect 
to  a specification,  a correct  development,  a complete  development,  and  an  incomplete  develop- 
ment. The  formal  definitions  concerning  a development  and  the  various  classifications  of 
developments  correspond  to  the  intuitive  notions  of  development  and  kinds  of  developments 
which  occur  in  a stepwise  development  process. 

Definition:  An  abstract  program  A is  an  ordered  pair,  (5,  C),  such  that  the  first  member  of  the 
pair,  S,  is  a specification,  and  the  second  member  of  the  pair,  C,  is  a set  of  implementations 
which  are  correct  with  respect  to  S. 

Definition:  A development  with  respect  to  a specification  S0  is  an  (n  + l)-tuple  of  abstract  pro- 
grams, ( A0 , Av  ...,  Aa),  for  some  nonnegative  integer  n such  that  for  each  i,  0 < i < n,  A{  = (Sif 
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c,). 

In  the  discussion  of  developments,  we  shall  usually  write  out  the  abstract  programs  expli- 
citly as  ordered  pairs  of  specifications  and  sets  of  implementations,  since  it  is  the  interaction  of 
the  components  of  the  ordered  pair  rather  than  the  abstract  programs  themselves  which  are  of 
interest. 

Notation:  Let  S be  a set.  By  |s|  we  mean  the  cardinality  of  S. 

Definition:  A development  with  respect  to  a specification  ^o»  ((So,  C0),  ($ii  CjJ,  ...,  (Sn,  Cn))  is 
correct  if  Ci+1  C Cif  0 < i < n. 

It  is  possible  that  a correct  development  with  respect  to  a specification,  $0,  will  have  a set  of 
implementations,  Ci?  for  which  Cj  = 0.  Then  all  Cj,  for  j > i,  will  also  be  equal  to  the  empty 
set.  These  kinds  of  developments  will  not  be  of  interest  in  themselves,  since  they  do  not  lead  to 
an  implementation  which  is  correct  with  respect  to  the  original  specification,  S0.  What  is  needed 
is  an  additional  property  for  developments,  which  will  ensure  that  the  sets  of  implementations 
associated  with  the  specifications  in  the  development  will  all  be  nonempty.  There  are  two  pro- 
perties of  developments  which  enable  us  to  describe  the  kinds  of  developments  that  we  wish  to 
consider.  These  two  properties  are  independent  of  the  correctness  of  a development,  but  will  be 
used  only  in  conjunction  with  correct  developments.  A correct  development,  which  is  complete , 
is  a development  ending  in  a single  implementation  which  is  correct  with  respect  to  the 
specification  with  which  it  is  associated.  A correct  development,  which  is  incomplete , is  a 
development  which  may  extended  (in  a sense  which  will  be  made  precise  later)  to  form  a correct 
and  complete  development.  We  define  the  notions  of  a complete  development  and  an  incomplete 
development  more  precisely. 
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Definition:  A development  with  respect  to  a specification  So.  ((So>  co)>  (Su  Ci)>  (5n,  Cn))  is 
complete  if  IcJ  = 1. 

Definition:  A development  with  respect  to  a specification  ((*,  Co),  (Sv  CJ,  ...,  ($n,  CJ)  is 
incomplete  if  IcJ  > 1. 

2.4.  Properties  of  Classes  of  Developments 

In  this  section  we  show  that  correct  developments  and  correct  and  complete  developments 
have  the  properties  that  we  would  expect  of  developments  which  resulted  in  implementations 
which  satisfy  the  original  specification  for  a software  component.  The  sets  of  implementations 
associated  with  all  correct  and  complete  developments  with  respect  to  a given  specification  have 
a particularly  nice  structure  when  ordered  by  the  relation  of  set  inclusion.  The  result  is  that  the 
collection  of  all  such  sets  of  implementations  with  the  order  relation  of  set  inclusion  is  a complete 
partial  order. 

Theorem  Is  Let  D be  a correct  and  complete  development,  ((50,  C0),  (Slf  Cx),  ...,  ($n,  Cn)),  with 
respect  to  the  specification  $0.  It  follows  that: 

(1)  for  each  integer  i,  0 < i < n,  C{  is  a subset  of  the  set  of  all  implementations  which  are 
correct  with  respect  to  the  specification  S{ 

(2)  Ci+1CCi,0<i<n 

(3)  IcJ  = 1. 

Proof:  Property  (1)  follows  from  the  fact  that  D is  a development.  Property  (2)  follows  from 
the  fact  that  D is  correct,  and  property  (3)  holds  because  D is  complete. 

Corollary:  Let  D be  a correct  and  complete  development,  (($0,  C0),  ($j,  CJ,  ($n,  Cn)),  with 
respect  to  the  specification  $0.  It  follows  that  for  each  i,  0 < i < n,  Cj  # 0. 

Proof:  Property  (2)  implies  the  nesting  of  all  of  the  Cj’s  with  respect  to  set  inclusion,  starting 
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with  C0  down  to  Cn.  Property  (3)  states  that,  at  the  last  stage  of  the  development,  the  set  of  all 
implementations  which  satisfy  S*  that  is,  the  most  deeply  nested  of  the  C;’s,  is  a singleton  set.  A 
direct  consequence  of  property  (2)  and  property  (3)  is  the  following: 

Cj  7^  0 for  1 < i < n. 

Corollary:  Let  D be  a correct  and  incomplete  development,  ((S0,  C0),  ($x,  C1),  ...,  ($n,  Cn)),  with 
respect  to  the  specification  S0.  It  follows  that  for  each  i,  0 < i < n,  C;  ^ 0. 

Proof:  The  proof  is  similar  to  the  previous  corollary  except  that  the  most  deeply  nested  of  the 
Cj’s  is  a set  with  cardinality  greater  that  1. 

Theorem  2:  Let  ((S0,  C0),  (5^  Cx),  ...,  (Sn,  CJ)  be  a correct  development  with  respect  to  the 
specification  S0.  If  p £ Cj  for  some  integer  i,  0 < i < n,  then  p is  correct  with  respect  to  all 
specifications  Sj,o<j<i. 

Proof:  If  p £ C;  for  some  integer  i,  0 < i < n,  then  p £ Cj  for  all  integers  j,  0 < j < i from  the 
definition  of  a correct  development. 

Corollary:  Let  (($„,  C0),  (^i>  ^i)>  •••»  (<Sn>  CJ)  be  a correct  and  complete  development  with 
respect  to  the  specification  S0.  If  p £ Cn  then  p is  correct  with  respect  to  all  specifications  $i,o< 
i <n. 

Lemma  2:  Let  ((50,  C0),  {$v  Cx),  ...,  (5a,  Cn))  be  a correct  development  with  respect  to  a 
specification  S0.  Let  S = {C0,  Cl,  ...,  Cn}.  Then  S is  a finite  chain  in  (P( C0),  C). 

Proof:  Clearly,  S is  finite  and  the  order  relation  “C”  restricted  to  the  set  S is  total. 

Definition:  We  call  the  set  S a finite  chain  associated  with  the  correct  development. 

Notation:  Let  DV  be  the  union  of  {0}  with  the  collection  of  all  finite  chains  associated  with  all 
correct  and  complete  developments  with  respect  to  a specification  S0. 
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Theorem  3:  The  ordered  pair  (DV,  C)  is  a cpo. 

Proofs  By  the  lemma  of  section  2.2,  (^(C0),  C)  is  a partial  order.  With  respect  to  the  same 
order  relation  “C”,  but  on  the  subset  DV  of  P{ C0),  (DV,  C)  is  a partial  order.  The  least  element 
of  DV  is  0.  By  the  preceding  lemma  and  the  fact  that  any  correct  and  complete  development 
with  respect  to  the  specification  S0  is  also  a finite  chain  in  (DV,  C),  (DV,  C)  contains  only  finite 
chains.  By  the  theorem  of  section  2.1,  (DV,  C)  is  a cpo. 

2.5.  Classes  of  Development  Steps 

In  this  section  we  introduce  the  notion  of  a development  step  and  several  classifications  of 
development  steps.  We  classify  development  steps  as  correct,  incomplete,  and  complete.  Correct 
development  steps  are  those  development  steps  that  have  properties  which  make  these  steps  suit- 
able for  use  in  constructing  correct  developments.  Incomplete  development  steps  are  used  in  con- 
structing all  but  the  final  step  in  a development,  while  a complete  development  step  is  used  as 
the  final  step  in  the  construction  of  a development.  The  notion  of  a development  step  is  funda- 
mental, since  it  is  the  concept  which  describes  the  result  of  a process  of  going  from  one  abstract 
program  to  another. 

Definition:  A development  step  with  respect  to  a specification  5j  is  an  ordered  pair  of  abstract 
programs,  (>{5,  Ai+l),  such  that  A{  = ($if  Cs)  and  Ai+1  = (Si+1,  Ci+1). 

For  the  same  reasons  presented  in  the  discussion  of  developments,  in  the  discussion  of 
development  steps,  we  shall  usually  write  out  the  abstract  programs  explicitly  as  ordered  pairs  of 
specifications  and  sets  of  implementations. 

Definition:  Let  (($„,  C0),  (Slf  CJ,  ...,  ($D,  CJ)  be  a development  with  respect  to  a specification 
$0.  Let  ((Sj,  Cj),  ($j+1,  Cj+1))  be  a development  step  with  respect  to  the  specification  S}.  The 
development  contains  the  development  step  if  j = i for  some  integer  i,  0 < i < n - 1,  that  is,  the 
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development  step  is  (($„  CJ,  ($i+1,  Ci+1)),  where  (5f,  Cj)  and  ( $i+1 , C1+1)  are  successive  members 
of  the  (n  + l)-tuple  which  is  the  development  with  respect  to  the  specification  50. 

Definition:  A development  step  with  respect  to  a specification  for  some  nonnegative  integer  i, 
((*i.  CJ,  (il+1,  Ci+i)),  is  correct  if  the  following  hold: 

(1)  C;,  Ci+1  # 0 

(2)  Ci+1  C C>. 

Definition:  A development  step  with  respect  to  a specification  Sif  ((5;,  Cj),(Si+1,  Ci+1)),  is  com- 
plete if  ICi+1|  = 1. 

Definition:  A development  step  with  respect  to  a specification  Si(  (($},  Cj),(Si+1,  Ci+1)),  is  incom- 
plete if  ICi+1l  > 1. 

2.6.  The  Extension  of  Developments  with  Development  Steps 

The  results  in  this  section  show  that  developments  can  be  extended  by  development  steps  to 
form  new  developments.  The  resulting  new  developments  have  properties  which  depend  upon  the 
original  developments  and  the  development  steps. 

Lemma:  Let  D be  a development,  (($„,  C0),  ($lt  CJ,  ...,  ($n,  CJ),  with  respect  to  the 

specification  $0.  Suppose  that  ((Sn,  CJ,  (SQ+1,  Cn+1))  is  a development  step  with  respect  to  Sa. 
Let  Dx  be  the  ordered  (n  + 2)-tuple,  {{S0,  C0),  {Sv  CJ,  ...,  (Sn+1,  Cn+J).  Then  Dx  is  a develop- 
ment with  respect  to  the  specification  S»  which  contains  the  given  development  step. 

Proof:  The  ordered  (n  + 2)-tuple,  Dv  is  a development  with  respect  to  the  specification,  S0, 
since  it  can  be  shown  that 

(1)  for  each  i,  0 < i < n,  Cs  is  the  set  of  all  implementations  which  are  correct  with  respect 
to  the  specification  St 

(2)  Cn+i  is  the  set  of  all  implementations  which  are  correct  with  respect  to  the  specification 
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^n+l* 

Property  (l)  follows  from  the  assumption  that  D is  a development,  while  property  (2)  follows 
from  the  assumption  that  (($n,  Cn),  ($n+1,  Cn+1))  is  a development  step  with  respect  to  $a.  The 
development  Dx  is  clearly  a development  which  contains  the  given  development  step,  (($n>  CJ, 

(^n+l>  Cn+l))- 

Lemmas  Let  D be  a correct  development,  ((50,  C0),  (Sv  CJ,  ....  (SD,  CJ),  with  respect  to  the 
specification  S0.  Suppose  that  ((Sn,  Cn),  (Sn+1,  Cn+1))  is  a correct  development  step  with  respect 
to  Sa.  Then  Dx,  the  ordered  (n  + 2)-tuple,  (($0,  C0),  (Sv  CJ,  ...,  (<?n+1,  Cn+1)),  is  a correct 
development  with  respect  to  the  specification  S0,  which  contains  the  given  development  step. 
Proof:  From  the  preceding  lemma,  Dx  is  a development  with  respect  to  the  specification  SQ 
which  contains  the  given  development  step.  Dx  is  a correct  development  since  it  can  be  shown 
that 

(1)  Ci+1  c Cit  0 < i < n 

(2)  Cn  C CB+1. 

Property  (1)  follows  from  the  assumption  that  D is  correct  and  property  (2)  follows  from  the 
assumption  that  the  development  step,  ((Sn,  CJ,  (Sn+1,  Cn+J),  is  a correct  development  step  with 
respect  to  Sn. 

Theorem  4:  Let  D be  a correct  development,  (($„,  C0),  (Sv  CJ,  ...,  (Sn,  Cn)),  with  respect  to  the 
specification  S0.  Suppose  that  ((5n,  Cn),  (Sn+1,  Cn+1))  is  a complete  and  correct  development  step 
with  respect  to  Su.  Let  Dx  be  (($0,  C0),  (Sj,  Cx),  ...,  (SB+1,  Cn+1)).  Dx  is  a correct  and  complete 
development  with  respect  to  the  specification  S0,  which  contains  the  given  development  step. 
Proof:  From  the  preceding  lemma,  Dx  is  a correct  development  with  respect  to  the  specification 
5°  which  contains  the  «iven  development  step.  Because  (($„,  CJ,  (Sa+1,  Cn+1))  is  a complete 
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development  step  with  respect  to  $n,  it  follows  that  |CB+1|  = 1.  This  shows  that  the  development 
is  complete. 

Corollary:  Let  D be  a correct  and  incomplete  development,  (($0,  C0),  ($x,  Cx),  ($B,  CJ),  with 
respect  to  the  specification  S0.  Suppose  that  (($„,  CJ,  (SB+X,  Cn+1))  is  a complete  and  correct 
development  step  with  respect  to  $B.  Let  Dx  be  (($0,  C0),  (Sx,  Cx),  ...,  ($B+X,  cn+i))-  is  a 
correct  and  complete  development  with  respect  to  the  specification  Sqi  which  contains  the  given 
development  step. 

2.7.  The  Construction  of  Developments  from  Development  Steps 

In  this  section  we  show  that  developments  can  be  constructed  from  development  steps.  The 
properties  of  the  resulting  developments  are  dependent  upon  the  properties  of  the  development 
steps  used  in  the  construction  of  the  developments. 

Lemma:  Let  ((50,  C0),  ($x,  CJ),  (($x,  Cx),  ($2,  C2)),  ...,  ((S„_x,  Cn_x),  (SB,  CJ)  be  a collection  of 
n development  steps  with  respect  to  the  specifications  •••>  respectively,  for  some  positive 

integer  n.  Let  D = ((50)  C0),  ($x,  Cx),  (5n,  Cn)).  Then  D is  a development  with  respect  to  the 
specification  S0. 

Proof:  This  follows  immediately  from  the  definition  of  a development  step. 

Lemma:  Let  ((Sq,  C0),  ($x,  Cx)),  ((5x,  Cx),  (52,  C2)),  ...,  ((Sn_x,  Cn_x),  (Snl  Cn))  be  a collection  of 
n correct  development  steps  with  respect  to  the  specifications  S0,  $x,...,  Sn  respectively,  for  some 
positive  integer  n.  Let  D = (($„,  C0),  (Sx,  Cx),  ...,  (Sn,  CJ).  Then  D is  a correct  development 
with  respect  to  the  specification  S0. 

Proof:  D is  a development  with  respect  to  the  specification  S0  from  the  preceding  lemma.  Since 
(($i>  Cj),  5i+X)  C,+x))  is  a correct  development  step  with  respect  to  the  specification  5;  for  each 
integer  i,  0 < i < n,  Ci+X  C C;.  It  follows  that  D is  a correct  development. 
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Theorem  5:  Let  ((50,  C0),  ($x,  Cj)),  ((Sx,  Cx),  (S2,  C2)),  •••,  (($n_x,  Cn_x),  (Sn,  Cn))  be  a collection 
of  n correct  development  steps  with  respect  to  the  specifications  S0,  Sv  ...,  SD  respectively,  for 
some  positive  integer  n.  Furthermore,  suppose  that  ((5n_1,  C,,^),  ($n,  Cn))  is  a complete  develop- 
ment step.  Let  D = ((50,  C0),  (Sv  Cx),  (Sn,  CJ).  Then  D is  a correct  and  complete  develop- 
ment with  respect  to  the  specification  S0. 

Proof:  From  the  preceding  lemma,  D is  a correct  development  with  respect  to  the  specification 
S0.  Since  ((Sn_1,  Cn_x),  (Sn,  Cn))  is  a complete  development  step,  IcJ  = 1.  It  follows  that  D is  a 
complete  development. 
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3.  An  Example  of  a Formal  Development 

In  this  section  we  relate  the  abstract  model  to  an  example  of  a formal  development.  In  this 
example  an  implementation  is  a while-program ; that  is,  the  programming  language  allows  assign- 
ment statements,  composed  statements,  conditional  statements,  and  while  statements.  Correct- 
ness means  partial  correctness  of  a while-program  with  respect  to  specifications.  This  is  an 
extension  of  the  concept  of  partial  correctness  of  a while-program  with  respect  to  pre-  and  post- 
conditions, in  which  the  pre-  and  post-conditions  are  well-formed  formulas  from  first  order 
predicate  logic.  A specification  is  based  upon  the  notion  of  pairs  of  pre-  and  post-conditions  and 
the  statements  allowed  by  the  while-programming  language.  An  abstract  program  A is  a pair  (5, 
C)  for  which  C is  a set  of  while-programs  which  are  partially  correct  with  respect  to  the 
specification  5. 

In  a formal  development  we  have  an  (n  -I-  l)-tuple  of  abstract  programs,  (A0,  Av  ...,  Aa), 
such  that  Ax  = (Si(  Cj)  for  0 < i < n.  For  the  first  abstract  program  in  the  development  A0, 
which  is  (S0,  C0),  the  specification  S0  is  the  original  specification  and  the  set  C0  is  a set  of  while- 
programs  which  are  partially  correct  with  respect  to  S0.  The  final  abstract  program  in  the 
development  is  (Sn,  CJ  where  Sa  is  a specification  which  specifies  a single  while-program.  We 
call  such  a specification  an  annotated  program.  The  set  Cn  is  a singleton  set  {W}  where  W is  a 
while-program  which  is  partially  correct  with  respect  to  Sn. 

From  the  abstract  model  the  successsive  pairs  of  abstract  programs  in  a correct  develop- 
ment must  be  related  to  one  another  in  the  sense  that  each  of  the  successive  pairs  are  correct 
development  steps.  In  the  example,  we  obtain  constraints  which  ensure  that  the  successive  pairs 
of  abstract  programs  are  correct  development  steps  and  that  the  last  abstract  program  in  the 
development  (Sn,  CJ  has  the  property  that  IcJ  = 1.  These  constraints  are  consequences  of  the 
definitions  which  we  introduce  and  the  properties  of  the  Hoare  calculus. 


17 


July  29,  1986 


DRAFT 


3.1.  Preliminary  Definitions 

The  following  definitions  and  notation  provide  the  basic  framework  used  in  the  discussion  of 
the  example. 

Definition*  A set  is  recursively  enumerable  if  there  exists  an  algorithm  which  recognizes  ele- 
ments of  the  set  but  which  may  not  terminate  for  elements  not  in  the  set. 

Definition:  The  logical  symbols  are  exactly  the  following: 
the  connectives  A,  V,  =$>,  and  = 
the  equality  symbol  = 

the  existential  quantifier  3 and  the  universal  quantifier  V 
the  four  punctuation  marks  .,  (,  ),  and  , 
the  variables  x,  y,  z,  x^  ...,  x',  ... 
the  truth  symbols  true  and  false. 

Notation:  The  set  of  symbols  in  the  language  of  first-order  predicate  logic  which  are  to  be  vari- 
ables is  denoted  by  V.  We  assume  that  V is  infinite  but  recursively  enumerable. 

Notation:  The  extralogical  symbols  are  taken  from  two  arbitrarily  chosen  sets,  which  are  dis- 
joint from  one  another  as  well  as  disjoint  from  the  set  of  all  logical  symbols.  These  two  sets  are: 

F , the  set  of  function  symbols. 

P,  the  set  of  predicate  symbols.  We  assume  that  the  sets  F and  P are  both  recursively 
enumerable. 

Definition:  A basis  for  predicate  logic  is  a pair  B = (F,  P)  of  sets  of  symbols,  where  F and  P 
are  understood  to  be  the  sets  of  function  and  predicate  symbols  previously  described. 
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Definition:  The  set  TB  of  all  terms  of  (first-order)  predicate  logic  over  a basis  B = (F,  P)  is 
defined  inductively  by: 

a)  Every  variable  from  V and  every  constant  from  F is  a term. 

b)  If  tj,  tn  are  terms  for  n > 1 and  f € F is  an  n-ary  function  symbol,  then  f(tx,  ...,  tn)  is 
a term. 

Definition:  ( Syntax  of  Predicate  Logic  ) The  set  WFFB  of  all  ( well-formed)  formulas  of  (first- 
order)  predicate  logic  over  a basis  B = (F,  P)  is  defined  inductively  by: 

(a)  The  truth  symbols  true  and  false  are  formulas. 

Every  propositional  constant  from  P is  a formula. 

If  tx  and  t2  are  terms,  then  tx  = t2  is  a formula. 

If  tx,  ...,  tn  for  n > 1 are  terms  and  p E P is  an  n-ary  predicate  symbol,  then  p(tx,  ...,  tn) 
is  a formula. 

(b)  If  w is  a formula,  then  (-i  w)  is  a formula. 

If  w is  a formula  and  x is  a variable,  then  (Vx.w)  and  (3x.w)  are  also  formulas. 

If  wx  and  w2  are  formulas,  then  so  are  (wx  A w2),  (wx  V w2),  (wx  =£>  w2),  and  (wx  = w2). 

Notation:  The  set  of  all  well-formed  formulas  which  do  not  have  any  quantifier  is  denoted  by 
QFFb.  A well-formed  formula  which  does  not  have  any  quantifier  is  called  quantifier  free. 

Definition:  ( Interpretation  ) Let  B = (F,  P)  be  a basis  for  predicate  logic.  An  interpretation 

of  B is  a pair  I = (D,  I0),  where  D is  a non-empty  set  (called  the  domain  of  I)  and  I0  is  a map- 
ping which  assigns 

(1)  To  every  constant  c G F an  element  J„(c)  6 D; 
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(2)  To  every  function  symbol  f £ F of  arity  n > 1 a total  function  I0(f):  D“  -*•  D; 

(3)  To  every  propositional  constant  a £ P an  element  J0  £ Bool; 

(4)  To  every  predicate  symbol  p 6 P of  arity  n > 1 a predicate  J0(p):  Dn  Bool. 

Definition:  A total  function  <r:  V — *•  D mapping  variables  to  the  domain  D of  some  interpreta- 
tion is  called  an  assignment  or  state.  The  set  of  all  assignments  for  some  interpretation  J is 
denoted  by  E j or  just  by  E. 

Definition:  ( Semantics  of  Predicate  Logic  ) Let  I = (D,  J0)  be  an  interpretation  for  a basis  B 

= (F,  P).  To  Jis  associated  a functional,  also  denoted  by  J,  which  maps  every  term  t £ TB  to  a 
function  I(t):  E -*  D and  every  formula  w £ WFFB  to  a function  J(w):  £ -►  Bool;  the  functions 
J(t)  and  J(w)  are  defined  as  follows: 

Semantics  of  terms 

(a)  If  c € F is  a constant,  then 

J(c)(<r)  = J0(c)  for  all  assignments  a £ E. 

If  x £ V is  a variable,  then 

Ax)(°')  = a(x)  for  all  assignments  a £ E. 

(b)  If  tj,  ...,  tn  for  n > 1 are  terms  and  f € F is  an  n— ary  function  symbol,  then 

J(f(t1...,tn))(<7)  = Jo(f)(-f(t'i)(<T)>*”>-f(tn)(or))  for  all  assignments  a £ E. 

Semantics  of  formulas 

(a)  I(true)(er)  = true  for  all  <7  £ E. 

J(false)(cr)  = false  for  all  a £ E.  If  a £ P is  a propositional  constant,  then 

J(a)(o)  = J0(a)  for  all  a £ E. 

If  tx,  t2  are  terms,  then 
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(true  if  J(t1)(<r)  = J(t2)(<r) 

J(ti  — t2)(a)  — |fajge  otherwise,  for  all  <r£  E. 

If  tj,  tn  for  n > 1 are  terms  and  P is  a n-ary  predicate  symbol,  then 

J(p(ti  ...,tn))(<r)  = J0(p)(^i)W,...,I(tn)(a))  for  all  a G S. 

(b)  If  w G WFFb  is  a formula,  then 

Jtrue  if  I{vr)(<r)  = false 
J((-iw))(<r)  — |fajge  otherwise,  for  all  a G E. 

If  wu  w2  G WFFB  then  analogous  statements  hold  for  (wx  A w2),  (wx  V w2),  (wx  =£>  w2), 
and  (wj  = w2). 

If  w G WFFB  and  x G V,  then 


(true  if  there  exists  d G D such  that  J(w)(«7'[x/d])  = true 
false  otherwise,  for  all  a G S. 

If  w G WFFg  and  x G V,  then 

(true  if  for  all  d G D J(w)(<r[x/dl)  = true 
J((Vx.w))(cr)  = |false  otherwiaej  for  ail  a g E. 

Definition:  A formula  w is  called  valid  in  an  interpretation  J,  denoted  by  (=jw,  if  I(w)(c)  = 
true  for  all  assignments  a G E;.  The  set  of  all  formulas  valid  in  I is  denoted  by  Th(  J). 

Definition:  A formula  w is  called  logically  valid,  denoted  by  }=w,  if  it  is  valid  in  all  interpreta- 
tions. 

Definition:  Let  W be  a subset  of  WFFB  of  well-formed  formulas  of  predicate  logic.  An 
interpretation  I is  called  a model  of  W,  if  ^=jw  for  every  formula  w G W.  A formula  w G WFFB 
is  called  a logical  consequence  of  W,  denoted  by  W^=w,  if  ^=jw  for  every  model  I of  W.  The  set 
of  all  logical  consequences  of  W is  denoted  by  Cn(W). 

Definition:  ( Calculi ) Let  SO  be  some  set  of  syntactic  objects.  A calculus  or  axiomatic  sys- 

tem over  SO  is  a pair  K = (A,  R.),  where 
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A is  a finite  set  of  axiom  schemes , which  are  decidable  subsets  of  SO;  the  elements  of  an  axiom 
scheme  are  called  axioms 

R is  a finite  set  of  inference  rules,  which  are  decidable  subsets  of  SOnX  SO,  n > 1. 

Definition:  Let  X be  a (possibly  empty)  subset  of  a set  SO  of  syntactic  objects.  The  set  of  all 
syntactic  objects  which  are  derivable  from  X in  calculus  K = (A,  R)  over  SO  is  defined  induc- 
tively by: 

a)  the  basis  set  X U I (J  A , and 

b)  the  constructor  set  R. 

If  a syntactic  object  s is  derivable  from  a set  of  syntactic  objects  X in  a calculus  K,  we 
write  X| — ^s  or,  X | — s.  If  X is  the  empty  set,  we  write  | — s.  A construction  sequence  of  s 
is  called  a deduction  for  s from  X in  K. 

Axiom  schemes 

(Al)  — >w  V w for  all  w £ WFFB 

(A2)  w‘  =>  3x.w  for  all  w € WFFB,  x £ V,  t 6 TB 

(A3)  x = x for  all  x € V 

(A4)  x = y =$>  y — x for  all  x,  y £ V 

(A5)  x = y A y = z =>  x = z for  all  x,  y,  z £ V 

(A6)  xx  = yt  =>  ...  =>  xn  = yn  =>  p(xx,  ....  xn)  =^>  p(ylf  ...,  yn)  for  all  xv  ...,  xn,  yv  ...,  y,£V 
for  n > 1 and  all  n-ary  predicates  p G P 

(A7)  xt  = yi  ...  =->  xn  = yn  =$>  f(xu  ...,  xj  = f(y1(  ...,  yn)  for  all  xlt  ...,  xn,  ylf  ...,  yn  € V 
for  n > 1 and  all  n-ary  function  symbols  f £ F 
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Inference  rules 

(Rl)  w v w for  all  w G WFFB 
w 


(R2) 


w. 


WX  v w2 


for  all  Wj,  w2  G WFFB 


(R3)  for  all  w w e WFF 

(wxVw2)Vw3  v 21  3 B 


(R4) 


Wj  =>  w2 


75- r— for  all  wx,  w2  G WFFB,  x G V,  such  that  x is  not  free  in  w2 

(lIW.WjJ  — ?>  w2 


, „ w,  V w,  -iw,  V w, 

(R5)  2^—l L for  all  wlf  w2,  w3  G WFFB 

w2  V W3 


3*2.  The  Construction  of  the  Example 

In  this  section  we  give  a precise  definition  of  the  syntax  of  while-programs,  the  syntax  of 
specifications  in  terms  of  pre-  and  post-conditions,  an  operational  semantics  for  while-programs, 
partial  correctness  of  a while-program  with  respect  to  a specification,  and  the  syntax  of  anno- 
tated programs.  The  definition  of  partial  correctness  of  a while-program  with  respect  to  a 
specification  is  an  extension  of  the  notion  of  partial  correctness  of  a while-program  with  respect 
to  formulas. 

Definition:  ( Syntax  of  £w  ) The  set,  L^,  of  while  programs  for  the  basis  B is  defined  induc- 
tively as  follows: 


a)  Assignment  statement  If  x is  a variable  from  V and  ti s a term  from  T*  then 

* :=  t 

is  a while  program. 


b)  Composed  statement  If  W1(  W2  are  while  programs  then 

Wl5W2 
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is  a while  program. 

c ) Conditional  statement  If  W1(  W2  are  while  programs  and  e is  a quantifier  free  formula 
from  QFFb,  then 

if  e then  Wj  else  W2  fi 

is  a while  program. 


d)  While  statement  If  Wj  is  a while  program  and  e is  a quantifier  free  formula  from  QFFB, 
then 


while  e do  Wx  od 

is  a while  program. 

Definition:  ( Syntax  of  Lg  ) The  set,  L®,  of  specifications,  for  the  basis  B is  defined  inductively 
as  follows: 


a)  Unknown  specification  If  p,  q are  formulas  from  WFFB,  then 


is  a specification. 


{p}  {q} 


b)  Assignment  specification  If  x is  a variable  from  V,  Ms  a term  from  TB  and  p,  q are  for- 
mulas from  WFFb,  then 

{p}  * •-  t {q} 

is  a specification. 


c)  Composed  specification  If  Sv  S2  are  specifications  and  p,  q are  formulas  from  WFFB,  then 

{p}  > $2  {q} 


is  a specification. 
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d)  Conditional  specification  If  Slf  S2  are  specifications,  e is  a quantifier  free  formula  from 
QFFb,  and  p,  q are  formulas  from  WFFB,  then 

(p|  */e  Men  S2  fi  {q} 

is  a specification. 

e)  While  specification  If  Sx  is  a specification,  e is  a quantifier  free  formula  from  QFFB,  and 
p,  q are  formulas  from  WFFB,  then 

{p}  while  e do  Sx  od  {q} 

is  a specification. 

Definition:  Let  S be  an  arbitrary  set  of  symbols.  A sequence  of  symbols  from  S is  called  a 
string  over  S.  A set  of  strings  over  S is  called  a formal  language  over  S.  The  number  of  symbols 
in  a finite  string  s is  called  its  length.  The  sequence  with  no  symbols,  which  has  length  0,  is  called 
the  empty  string  and  is  denoted  by  e. 

Definition:  A configuration  for  a basis  B and  an  interpretation  I of  B is  a pair, 

(W,  a)  € (L®  U { e })  X S7. 

The  first  member  of  the  ordered  pair,  W,  represents  the  rest  of  the  program  to  be  executed,  and 
a represents  the  contents  of  the  variables. 

Definition:  ( Transition  Relation  ) For  every  basis  B and  every  interpretation  I of  B,  the 

relation  =$>c  on  the  set  of  configurations  (Lyy  U { e })  X Sj  is  defined  by: 

(Wlf  <r1)  =^>c  (W2,  <r2)  iff  one  of  the  following  six  conditions  holds: 

(1)  There  is  a variable  x € V and  a term  t G TB  such  that 

Wi  is  z :=  t ; W2 

and 

= <ri[*/J(t)(*i)]; 
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(2)  There  are  while  programs  Wy  W2',  W3'  and  there  is  a quantifier  free  formula  e such  that 

is  if  e then  W/  e/se  W2'  /i  ; W3' 


and 


fWj'jW,'  if  /(e)^)  = true 
Wz  is  }W2' ; W3»  if  /(eK^)  = false; 

(3)  There  are  while-programs  Wy  W2'  and  there  is  a quantifier  free  formula  e such  that 

Wj  is  while  e do  W/  od ; W2' 


and 


<t2  = <7X 


JWj'jWj  if  J(e)(<r1)  = true 
Wz  18  }W2'  if  J(e)(ffl)  = false; 

(4)  There  is  a variable  x € V and  a term  t 6 TB  such  that 

is  x t 
W2  is  e 

and 

<t2  = ^[x/JftXoi)]; 

(5)  and  (6)  are  similar  to  (4)  for  (2)  and  (3). 

Definition:  A computation  sequence  for  a state  <j,  which  is  called  an  input  state,  is  a sequence  of 
configurations 

(W1(  (W2,  <r2), ... 

such  that  = W,  <xx  = a and  for  every  pair  of  consecutive  configurations  in  the  sequence 

(Wj,  (Tj)  =$>c  (Wi+l,  <ri+l) 

for  i > 1. 

A computation  sequence  which  is  either  infinite  or  ends  with  a configuration  (Wk,  <rk)  such  that 
Wk  = c is  called  a computation. 
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A while-program  W is  said  to  terminate  for  an  input  state  a,  if  there  is  a finite  computation 

(w„  »,),  ....  (Wk,  „*) 

for  this  input  state.  The  state  <7k  is  called  the  output  state. 

Definition:  ( Operational  Semantics  of  Lw  ) Let  W be  a while-program  from  over  the 

basis  B and  let  I be  an  interpretation  of  this  basis.  The  meaning  of  W ( in  the  interpretation  I ) 
is  the  function  M^W):  E — *-p  E,  or  .M(W),  defined  by  the  following: 

jo7  if  W terminates  for  the  input  state  o with  output  state  </; 

^ ' — | undefined  if  W does  not  terminate  for  the  input  state  <r. 

Definition:  ( Correctness  with  Respect  to  Formulas  ) Let  B be  a basis  for  predicate  logic,  I an 

interpretation  of  this  basis,  and  E the  corresponding  set  of  states.  Let  W be  a while-program 

from  L$  and  let  .Mj(W)  be  the  meaning  of  the  program  W.  Let  p,  q be  formulas  from  WFFB. 

The  program  W is  partially  correct  with  respect  to  p and  q ( in  the  interpretation  I ) if  for  all 

states  <7  6 E it  follows  that  if  I(p)(cr)  = true  and  Aij(W)(<r)  is  defined  then  J(q)(X^W))(o-)  is  true. 

Definition:  Let  B,  J,  E,  W,  p,  q be  as  in  the  preceding  definition.  Then  the  formulas  p and  q are 
called  the  pre-condition  and  post-condition,  respectively. 

Definition:  ( Correctness  with  Respect  to  Specifications  ) Let  W be  a while-program  from  L^. 
The  notion  that  W is  partially  correct  with  respect  to  the  specification  S (in  the  interpretation  I) 
is  defined  inductively  (the  induction  being  on  the  specification,  S ) as  follows: 

a)  If  S is  an  unknown  specification, 

{p}  {q}> 

where  p,  q are  formulas  from  WFFB,  then  W is  partially  correct  with  respect  to  S if 
(i)  W is  partially  correct  with  respect  to  p and  q (in  the  interpretation  I). 

b)  If  S is  an  assignment  specification, 
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{p}  x ;=  t {q}, 

where  x is  a variable  from  V,  f is  a term  from  TB  and  p,  q are  formulas  from  WFFB,  then 
W is  partially  correct  with  respect  to  S if 

(i)  W is  x ;=  t. 

(ii)  W is  partially  correct  with  respect  to  p and  q. 

c)  If  5 is  a composed  specification, 

{p}  $i  ; S2  {q}, 

where  Sx,  S2  are  specifications  from  L®,  and  p,  q are  formulas  from  WFFB,  then  W is  par- 
tially correct  with  respect  to  S if 

(i)  W is  Wx  ; W2  for  some  Wlt  W2  6 L$. 

(ii)  W is  partially  correct  with  respect  to  p and  q. 

(iii)  Wj  is  partially  correct  with  respect  to  the  specification  Sv 

(iv)  W2  is  partially  correct  with  respect  to  the  specification  $2. 

d)  If  5 is  a conditional  specification, 

{p}  if  e then  $t  else  S2  fi  {q}, 

where  Sv  S2  are  specifications  from  L3B,  e is  a quantifier  free  formula  from  QFFB,  and  p,  q 
are  formulas  from  WFFB,  then  W is  partially  correct  with  respect  to  S if 

(i)  W is  t/e  then  Wx  else  W2  fi  for  some  Wx,  W2  G L^. 

(ii)  W is  partially  correct  with  respect  to  p and  q. 

(iii)  Wx  is  partially  correct  with  respect  to  the  specification  $v 

(iv)  W2  is  partially  correct  with  respect  to  the  specification  S2. 

e)  If  5 is  a while  specification, 

{p}  while  e do  Sx  od  {q}, 
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where  Sx  is  a specification  from  L®,  e is  a quantifier  free  formula  from  QFFB,  and  p,  q are 
formulas  from  WFFB,  then  W is  partially  correct  with  respect  to  S if 

(i)  W is  while  e do  od  for  some  Wj  £ Lyy. 

(ii)  W is  partially  correct  with  respect  to  p and  q. 

(iii)  Wj  is  partially  correct  with  respect  to  the  specification  Sv 

Definition:  Let  W,  I,  S,  p and  q be  as  in  the  preceding  definition.  Then  the  formulas  p and  q are 
called,  respectively,  the  pre-condition  and  post-condition  associated  with  the  specification  S. 

If  S is  the  unknown  specification, 

{p}  {q}, 

then  the  pre-  and  post-conditions  associated  with  S are  p and  q. 

Definition:  ( Syntax  of  1A  ) The  set,  L®,  of  annotated  programs  for  the  basis  B is  defined 

inductively  as  follows: 

a)  Assignment  statement  If  x is  a variable  from  V , t is  a term  from  TB,  and  p,  q are  formu- 
las from  WFFb,  then 

{p}  x :=  t {q} 

is  an  annotated  program. 

b)  Composed  statement  If  Alf  A2  are  annotated  programs,  and  p,  q are  formulas  from 
WFFBj  then 

{p}  Ax  ; A2  {q} 

is  an  annotated  program. 

c)  Conditional  statement  If  A1#  A2  are  annotated  programs,  p,  q are  formulas  from  WFFB, 
and  e is  a quantifier  free  formula  from  QFFB,  then 


29 


July  29,  1986 


DRAFT 


ip}  if  e then  Aj  else  A2  fi  {q} 

is  an  annotated  program. 

d)  While  statement  If  Aj  is  an  annotated  program  p,  q are  formulas  from  WFFB,  and  e is  a 
quantifier  free  formula  from  QFFB,  then 

(p)  while  e do  Ax  od  {q} 

is  an  annotated  program. 

We  make  a distinction  in  the  preceding  definitions  between  the  sets  of  all  while— programs, 
Wi  specifications,  Lg , and  annotated  programs,  L®,  and  the  corresponding  sets  along  with  an 
interpretation,  which  we  denote  by  £.w,  £g,  £A(  respectively. 

3.3.  The  Hoare  Logic  and  Calculus 

Given  that  an  implementation  is  a while-program,  a specification  is  in  terms  of  pre-  and 
post-conditions,  and  correctness  is  partial  correctness  of  while-programs  with  respect  to  these 
specifications,  it  is  necessary  to  have  a logic  and  a calculus  for  a discussion  of  a formal  develop- 
ment within  this  framework.  The  Hoare  logic  and  calculus  provide  a natural  means  for  reasoning 
about  such  a formal  development.  We  give  some  basic  definitions  and  state  some  results  concern- 
ing Hoare  logic  and  Hoare  calculus  which  we  use  in  later  sections  to  discuss  the  example  of  a for- 
mal development.  These  are  from  [3].  For  a survey  of  Hoare  logic,  see  [1]. 

Definition:  ( Syntax  of  Hoare  Logic  ) Let  B be  a basis  for  predicate  logic.  A Hoare  formula 
over  the  basis  B is  an  expression  of  the  form 

(P)  W {q} 

where  p,  q € WFFB  are  formulas  of  the  predicate  logic  and  W £ L#  is  a while  program. 

Definition:  ( Semantics  of  Hoare  Logic  ) Let  an  interpretation  J of  a basis  B for  predicate 

logic  be  given,  and  let  S be  the  corresponding  set  of  states.  Every  Hoare  formula  {pj  W jq}  £ 
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HFb  is  mapped  by  a semantic  functional,  also  denoted  I,  to  a function 

-T({p}  W {q}):  S — Bool 

defined  as  follows: 

if  J(p)(<r)  = true 

J({p}  W (q})(cr)  = true  iff  and  if  X(W)(a)  is  defined, 

then  J(q)(Af  j(W)(<t))  = true. 

Definition:  A Hoare  formula,  {p}  W {q},  is  said  to  be  valid  in  an  interpretation  J,  denoted  by 

Kr{p}W{q} 

if  J({p}  W {q))(<r)  = true  for  all  states  a € 2. 

We  note  that  to  say  a Hoare  formula,  {p}  W {q},  is  valid  in  an  interpretation  J is  a restate- 
ment within  the  context  of  Hoare  logic  of  the  fact  that  W is  partially  correct  with  respect  to  the 
formulas  p and  q in  the  interpretation  J.  More  generally,  {p}  W {q}  is  valid  in  an  interpretation 
I if  and  only  if  W is  partially  correct  with  respect  to  the  unknown  specification,  {p}  {q}. 

Definition:  A Hoare  formula,  {p}  W {q},  is  said  to  be  logically  valid,  denoted  by 

1=  {Pi  W {q} 

if  |=j  {p}  W |q)  for  all  interpretations  J. 

Definition:  A Hoare  formula,  {p}  W {q},  is  called  a logical  consequence  of  a set  F C WFFB  of 
formulas  of  the  predicate  logic,  denoted  by 

Fh{p}W{q} 

if  (=/  {p}  W {q}  holds  for  all  models  I of  F. 

Definition:  The  Hoare  calculus  (over  a basis  B for  a predicate  logic)  is  a calculus  over  the  union 
of  the  set  HFB  of  Hoare  formulas  and  the  set  WFFB  of  formulas  of  the  predicate  logic  and  con- 
sists of  an  axiom  (scheme)  and  five  inference  rules. 
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(i)  Assignment  axiom 


(Pxl  * •-  t {p} 

for  all  p G WFFb,  x G V,  t G Tb 


(ii)  Composition  rule 


{P}  W,  {r},  {r}  W2  {q} 
{Pi  Wx  ; W2  {q} 

for  all  p,q,r€  WFFB  W2  6 L$ 


(iii)  Conditional  rule 


ip  A e}  W,  {q},  {p  A ^e}  W2  {q} 
{p}  if  e then  else  W2  fi  {q} 


for  all  p,  q G WFFB,  w G QFFB,  Wlf  W2  G L$. 


(iv)  While  rule 


{p  A e}  {p} 

{p}  while  e do  Wx  od  {p  A -*} 

for  all  p G WFFb,  e G QFFB,  Wx  G L2B. 


(v)  Consequence  rule 


p =>  q,  {q>  W {r},  r =»  s 
{P}  W {s} 


for  all  p,  q,  r,  s G WFFB,  W G L$. 

Lenuna:  ( Derived  Rule  ) For  all  p,  q G WFFB,  x G V,  and  t G TB  it  follows  that: 

P =>  9x‘ 


{p}  * •-  t {q} 

The  following  theorem  states  that  the  formulas  which  are  derivable  in  the  Hoare  calculus 
are  logical  consequences  of  subsets  of  WFFB;  that  is,  in  an  intuitive  sense,  the  derivable  formulas 
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are  true.  In  particular,  this  theorem  holds  when  the  subset  of  WFFB  is  a theory. 

Theorem:  ( Soundness  of  the  Hoare  Logic  ) Let  B be  a basis  for  predicate  logic,  let  p,  q £ 

WFFb,  and  W £ L$.  Then  for  each  subset  F C WFFB  and  each  Hoare  formula  {p}  W {q}  £ 
HFb: 

if  F |-  {p}  W {q},  then  F f=  {p}  W {q}. 

Lemma:  ( Hoare  Logic  is  a First-order  Logic  ) Let  B be  a basis  for  predicate  logic  and  I an 

interpretation  of  B.  Then  for  each  Hoare  formula  h £ HFB 

\=j  h iff  Th(J)  |=  h. 
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4.  A Development 

It  follows  from  Theorem  5 that  a complete  and  correct  development  can  be  obtained  from  a 
finite  sequence  of  correct  development  steps,  if  the  finite  sequence  meets  the  additional  require- 
ment that  the  final  step  in  the  development  is  complete.  This  reduces  the  problem  of  construct- 
ing a correct  and  complete  development  to  the  problem  of  constructing  a finite  sequence  of 
correct  development  steps,  the  last  step  also  being  complete. 

Within  the  framework  of  the  Hoare  calculus,  we  construct  an  example  of  a development  as  a 
finite  sequence  of  abstract  programs.  Given  a specification,  S,  we  need  to  be  able  to  associate 
with  it  a set  of  while-programs  which  are  partially  correct  with  respect  to  the  given  specification. 
This  will  enable  us  to  construct  an  abstract  program  from  the  specification.  We  have  the  notion 
of  partial  correctness  of  a while-program  with  respect  to  a specification.  We  need,  however,  a 
notion  in  terms  of  a derivation  within  the  Hoare  calculus,  which  will  imply  partial  correctness  of 
a while-program  with  respect  to  a specification. 

4.1.  Derivations  and  Partial  Correctness 

In  this  section  we  present  some  preliminary  results  which  show  the  connection  between 
derivations  from  a theory  of  an  interpretation  in  the  Hoare  calculus  and  partial  correctness.  The 
following  lemma  connects  a Hoare  formula  which  is  derivable  from  the  theory  of  an  interpreta- 
tion with  the  notion  of  partial  correctness  of  unknown  specifications. 

Lemma:  ( Derivations  from  a Theory  and  Valid  Hoare  Formulas  ) Let  B be  a basis  for  predi- 

cate logic  and  J an  interpretation  of  B.  It  follows  that  for  each  Hoare  formula  h £ HFB,  if  Th(i) 
f—  h then  \=j  h. 

Proof:  From  the  soundness  of  the  Hoare  calculus,  if  Th(I)  f—  h,  then  Th(I)  f=  h.  From  the 
lemma  that  Hoare  logic  is  a first-order  logic,  it  follows  that  h. 
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As  an  immediate  consequence  of  this  lemma,  we  see  that  if  there  exists  a derivation  of  the 
Hoare  formula, 

{P}  W {q}, 

from  the  theory  of  an  interpretation,  then  W is  partially  correct  with  respect  to  an  unknown 
specification  {p}  {q}.  This  relationship  between  derivations  from  a theory  and  partial  correctness 
is  the  result  of  the  next  lemma. 

Lemma:  ( Derivations  from  a Theory  and  Partial  Correctness  ) Let  B be  a basis  for  predicate 

logic  and  I an  interpretation  of  B.  Let  S be  the  unknown  statement  specification, 

{p}  {q>- 

It  follows  that  for  each  Hoare  formula  {p}  W {q}  £ HFB,  if  Th(J)  |—  {p}  W {q},  then  W is  par- 
tially correct  with  respect  to  the  specification  S. 

Proof:  From  the  preceding  lemma,  it  follows  that  f=j  {p}  W {q}.  Therefore,  W is  partially 
correct  with  respect  to  the  specification  S. 

It  is  possible  to  associate  with  unknown  specifications  sets  of  while-programs,  which  are 
defined  in  terms  of  derivations  within  the  Hoare  calculus.  These  sets  have  the  property  that  any 
element  is  a while-program  which  is  partially  correct  with  respect  to  the  unknown  specification 
with  which  it  is  associated.  The  following  lemma  constructs  an  abstract  program  from  an  unk- 
nown specification. 


Lemma:  Let  S be  the  unknown  specification, 


{p}  {q}» 

and  let 

C = {W£L$ITh(J)|-{p}W{q}  }. 


Then  ($,  C)  is  an  abstract  program. 
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Proof:  We  need  to  show  that  for  each  W £ C,  W is  partially  correct  with  respect  to  S.  This  fol- 
lows from  the  preceding  lemma. 

4.2.  The  Construction  of  an  Abstract  Program 

In  this  section  we  introduce  a definition  which  is  an  extension  of  the  notion  of  the  deduction 
of  a Hoare  formula  from  a theory.  This  definition  is  used  to  associate  a set  C of  implementations 
with  a specification  S from  L® . This  section  also  contains  a theorem  which  shows  that  the  pair, 
(5,  C),  is  an  abstract  program.  This  extends  a similar  result  for  unknown  specifications. 

Definition:  ( Deduction  Consistent  with  a Specification  ) Let  B be  a basis  for  predicate  logic, 
W a while-program  from  L^,  I an  interpretation  of  the  basis  B,  S a specification  from  L®,  and 
P respectively,  the  pre—  and  post— conditions  associated  with  the  specification  S . The  notion 
that  there  is  a deduction  from  Th(I)  to  the  Hoare  formula  {p'}  W {q'}  consistent  with  5,  denoted 
by: 

Th(J)  |-s  {p'>  w W), 

is  defined  inductively  (the  induction  being  on  the  specification,  S)  as  follows: 

a)  If  S is  an  unknown  specification, 

/ 

(p'l  W, 

then 

Th(J)  |-S  M W (q'} 
if 

CO  Th(i)  {p'}  W {q'}. 

b)  If  S is  an  assignment  specification, 

{p'}  * t {q'}, 

where  x is  a variable  from  V,  f is  a term  from  TB  then 
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Th(J)  |-s  {p'}  W {q'| 
if 

(i)  W is  x :=  t 

(ii)  Th(i)  I-  {p'J  W {q1}. 

c)  If  5 is  a composed  specification, 

{P'}  S l ; S2  {q'}, 

where  Sv  S2  are  specifications  from  L®,  plf  qx  and  p2,  q2  are  the  pre-  and  post-conditions 
associated  with  Sv  and  S2,  respectively,  then 

Th (J)  {P'}  W {q'} 

if 

(i)  W is  Wj  ; W2  for  some  Wlf  W2  € L& 

(ii)  Th(i)  H (p'}  W {q'| 

(iii)  Th(J)  (-1'  {p,}  W,  !q,( 

(iv)  Th(J)  I-5'  {Ps!  W,  {q,}. 

d)  If  S is  a conditional  specification, 

{p'}  if  e then  5|  else  S2  fi  {q'}, 

where  Sv  S2  are  specifications  from  L®,  e is  a quantifier  free  formula  from  QFFB,  px,  qx 
and  p2,  q2  are  the  pre-  and  post-conditions  associated  with  Sv  and  S2,  respectively,  then 

Th(J)  |-S  {p'l  w (q'f 
it 

(i)  W is  i/e  then  W,  else  W,  fi  for  some  W„  W,  € L$. 

(ii)  Th(2)  I-  {p'}  W (,'} 

(iii)  Th(/)  I— (Pi)  W,  (q,) 
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(iv)  Th (J)  (-S‘  {P2}  W2  {q2}. 

* 

e)  If  5 is  a while  specification, 

{p'}  while  e do  5X  od  {q'}, 

where  is  a specification  from  L®,  e is  a quantifier  free  formula  from  QFFB,  and  px,  qx 
are  the  pre-  and  post-conditions  associated  with  Sv  then 

Th(J)  f-!  <P’>  W W) 
if 

(i)  W is  while  e do  Wx  od  for  some  Wx  E L^. 

(ii)  Th  (J)  H {p'}  W {q'} 

(iii)  Th(J)  h-!‘  {p,(  W,  {q,}. 

Lemma:  Let  W E L^,  5 E L®,  and  let  p',  q'  be  the  pre-  and  post-conditions  associated  with  $. 
If 

Th(J)  M {p'l  w {q'}, 

then  W is  partially  correct  with  respect  to  the  specification  5. 

Proof:  This  is  an  immediate  consequence  of  the  preceding  definition,  the  definition  of  correct- 
ness with  respect  to  specifications,  and  the  lemma  on  derivations  from  a theory  and  partial 
correctness. 

Note  that  in  the  case  that  S is  the  unknown  specification, 

{p}  {q}, 

Th( I)  J— s {p}  W {q},  reduces  to  Th(J)  {p}  W {q}. 

Just  as  the  notion  of  partial  correctness  with  respect  to  specifications  is  an  extension  of  the 
notion  of  partial  correctness  with  respect  to  formulas,  the  notion  of  a deduction  from  a theory  of 
an  interpretation  to  a Hoare  formula  consistent  with  a specification  is  an  extension  of  the  notion 
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of  a deduction  from  a theory  of  an  interpretation  to  a Hoare  formula.  From  the  preceding 
lemma,  we  have  the  connection  between  derivations  consistent  with  specifications  and  partial 
correctness  of  while-programs  with  respect  to  specifications.  We  use  the  next  theorem  in  the 
construction  of  abstract  programs  from  specifications. 

Theorem:  Let  S <E  L®,  and  let  p',  q'  be  the  pre-  and  post-conditions  associated  with  S.  If  C is 

{WGL$ITh(J)h-S  (p'}W{q'}  }, 

then  (S,  C)  is  an  abstract  program. 

Proofs  We  need  to  show  that  for  each  W 6 C,  W is  partially  correct  with  respect  to  S.  This  fol- 
lows from  the  preceding  lemma. 

4.3.  The  Construction  of  a Development 

We  recall  that  a development  with  respect  to  a specification  S0  is  an  (n  + l)-tuple  of 
abstract  programs,  (A0,  Av  ...,  An),  for  some  nonnegative  integer  n such  that  for  each  i,  0 < i < 
n,  4 = (Si,  Cj).  Let  S0  be  a given  specification  and  let  p',  q'  be  the  pre-  and  post-conditions 
associated  with  S0.  We  can  form  an  abstract  program  A0  by  defining  C0  to  be 

{WGL®  ITh(J)  I-*’  {p'}  W {q'}  }. 

The  fact  that  A0  is  an  abstract  program  is  the  main  result  of  the  preceding  section.  If  the 
specification  50  is  itself  an  annotated  program,  then  |C0|  = 1 and  A0  is  a correct  and  complete 
development. 

If  50  is  not  an  annotated  program,  then  it  “contains”  an  unknown  specification.  We  prove 
this  fact  in  the  course  of  constructing  a correct  development  step.  The  notion  that  a specification 
contains  an  unknown  specification  will  be  defined  precisely  in  the  section  on  a correct  develop- 
ment step.  Assume  that  we  have  a way  of  constructing  a correct  development  step;  that  is,  from 
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an  abstract  program  A0,  which  is  ($0,  C0),  we  can  construct  a new  abstract  program  Ay,  which  is 
($!,  CJ,  such  that  Cx  C C0.  Cx  is  defined  to  be 

{ W6L$  ITh^bSp'lWIq'}  }. 

If  Si  is  an  annotated  program  then  10,1  = 1.  It  follows  that  the  pair,  (A0,  Ax),  which  is 

«s..  C„),  (S„  O,)), 

is  a correct  and  complete  development. 

In  general,  if  we  have  an  incomplete  development,  (/f0,  Ay,  ...,  A^y),  then,  assuming  that  we 
have  a way  of  constructing  a correct  development  step,  we  can  construct  (A-t^y,  A,),  where  A-t  is 
(S},  Cj)  and  Cj  is 

{ W € Lj  I Th(J)  |-s‘  {p'J  W {q1}  }. 

If  5j  is  an  annotated  program,  then  IcJ  = 1 and  (A0,  Ay,  ...,  Ay)  is  a correct  and  complete  develop- 
ment; otherwise,  we  continue  by  constructing  a new  correct  development  step,  (A,,  /li+1).  Since 
the  abstract  model  describes  an  idealized  development  which  always  ends  with  an  implementa- 
tion which  is  correct  with  respect  to  the  specification  with  which  it  is  associated,  by  assumption, 
in  the  example  of  a development  within  the  framework  of  the  Hoare  calculus,  we  restrict  our- 
selves to  a consideration  of  those  cases  for  which  there  exists  a nonnegative  integer  n,  and 
abstract  programs,  A0,  Ay,  ...,  An,  such  that  (A0,  Ay,  ...,  Aa)  is  a correct  and  complete  develop- 
ment. In  short,  the  example  we  present  gives  an  explicit  construction  of  a development,  which 
we  prove  to  be  correct  and  complete,  under  the  assumption  that  a correct  and  complete  develop- 
ment.exists. 

In  the  next  section,  we  give  a construction  of  correct  development  step.  We  define  a 
specification  transformation, 

T:  $-,  — ► $j+i, 

and  we  give  conditions  under  which  it  is  possible  to  have  a transformation,  T,  which  preserves 
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partial  correctness.  More  precisely,  we  prove  that  for  each  nonnegative  integer  i,  if  there  exists  a 
suitable  specification  transformation, 

T:  — ► $i+1, 

then  W € Cj  implies  that  W G Ci+1.  This  result  can  be  restated  as  follows:  If  we  have  a while- 
program  which  is  partially  correct  with  respect  to  $|  and  several  other  constraints  are  satisfied 
concerning  the  transformation  T,  then  this  while-program  is  partially  correct  with  respect  to  the 
new  specification,  Si+1. 
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5.  A Correct  Development  Step 

In  the  abstract  model  a development  step  with  respect  to  a specification  S is  a pair  of 
abstract  programs  (( S , C),  (S' , C')).  In  the  example,  it  is  necessary  to  precisely  define  a new 
abstract  program, 

(S',  C'), 

given  an  abstract  program, 

(S,  C). 

Given  a specification,  S,  we  define  a transformation,  T,  from  S to  S1.  We  associate  with  S1  a 
set  of  implementations,  C',  such  that  (S',  C')  is  an  abstract  program.  In  order  to  define  a 
specification  transformation  we  need  to  define  the  notion  that  a specification  “contains”  an  unk- 
nown specification.  We  also  prove  two  theorems  which  depend  upon  this  definition.  The  first 
theorem  relates  specifications  and  annotated  programs.  The  second  theorem  is  a result  about  the 
cardinality  of  the  set  of  implementations  which  are  partially  correct  with  respect  to  a 
specification  which  is  also  an  annotated  program.  The  definition  and  theorems  are  in  section  5.1. 
In  section  5.2  we  define  a specification  transformation, 

T:  S -+  S', 

for  the  special  case  in  which  S is  an  unknown  specification.  We  introduce  proof  rules  which  are 
sufficient  for  the  construction  of  a correct  development  step, 

((5,  C),  (S',  C1)). 

In  section  5.3  we  extend  the  definition  of  a specification  transformation  to  include  a larger  class 
of  specifications  than  the  unknown  specifications.  In  section  5.4  we  extend  the  notion  of  proof 
rules  to  include  this  larger  class  of  specifications. 

It  is  also  necessary  to  prove  that  each  step  in  the  development  is  correct  for  this  larger  class 
of  specifications.  In  terms  of  the  abstract  model  this  involves  proving  that  C'  C C.  In  section  5.5 
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we  show  that  under  the  generalized  specification  transformation, 

T:  S -*•  S', 

the  pair  of  abstract  programs, 

((5,  c),  (?,  co), 

is  a development  step  with  the  property  that  C'CC.  In  section  5.6  we  prove  that  under  the  gen- 
eralized specification  transformation  for  which  the  proof  rules  hold,  if  there  is  a W € C,  then  W 
G C'.  Thus,  given  the  existence  of  an  appropriate  specification  transformation  for  which  the 
proof  rules  hold,  it  is  possible  to  prove  that  a while-program,  which  is  partially  correct  with 
respect  to  the  specification  5,  is  also  partially  correct  with  respect  to  the  transformed  (and  more 
detailed)  specification  S*. 


5.1*  Specifications  and  Annotated  Programs 

In  this  section  we  lay  the  foundation  for  the  construction  of  a correct  development  step.  We 
formally  define  the  notion  that  a specification  “contains”  an  unknown  statement  specification. 
This  formal  definition  corresponds  to  the  meaning  that  one  would  intuitively  expect  for  the  idea 
that  one  specification  contains  another  specification.  We  use  this  definition  in  the  proofs  which 
occur  in  the  construction  of  a correct  development  step. 

Definition:  ( Syntax  of  ) The  set,  j |qj,  of  specifications  which  contain  the  unknown 

specification, 

{p}  {q}> 

for  formulas  p,  q from  WFFB  for  the  basis  B is  defined  inductively,  the  induction  being  on  a 
specification  S which  has  pre-  and  post-conditions  p'  and  q',  respectively,  as  follows: 

Basts 


a)  Unknown  statement  specification  If  $ is  the  unknown  specification, 
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(p| 

then  $ contains  the  unknown  specification,  {p}  {q}. 

Induction  step 

b)  Composed  statement  specification  Let  Sv  S2  be  specifications  from  L®,  and  suppose  that 
either  St  or  S2  contains  the  unknown  specification,  {p}  {q}.  If  S is  the  composed  state- 
ment specification, 

M S,  i S, 

then  5 contains  the  unknown  specification,  {p}  {q}. 

c)  Conditional  statement  specification  Let  Sv  $2  be  specifications  from  L®,  e a quantifier 
free  formula  from  QFFg,  and  suppose  that  either  Si  or  S2  contains  the  unknown 
specification,  {p}  {q}.  If  5 is  the  conditional  statement  specification, 

{p,|  if  e then  else  S2  fi  |q'} 
then  S contains  the  unknown  specification,  {p}  {q}. 

d)  While  statement  specification  Let  5X  be  a specification  from  L®,  e be  a quantifier  free  for- 
mula from  QFFb,  and  suppose  that  Sx  contains  the  unknown  specification,  {p}  {q}.  If  5 is 
the  while  statement  specification, 

{p'}  while  e do  Sx  od  {q'}, 

then  S contains  the  unknown  specification,  {p}  {q}. 

The  theorem  which  follows  shows  the  relationship  between  specifications  which  do  not  con- 
tain unknown  specifications  and  annotated  programs. 

Theorem:  ( Specifications  and  Annotated  Programs  ) If  S £ L®  does  not  contain  any  unknown 
statement  specification,  then  S is  an  annotated  program. 
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Proofs  The  proof  is  by  induction  on  the  specification  S. 

a)  If  S is  an  assignment  specification, 

|p>  * •-  * {q}» 

where  x is  a variable  from  V,  t is  a term  from  TB  and  p,  q are  formulas  from  WFFB,  then 
5 is  an  annotated  program  by  definition. 

b)  If  S is  an  composed  specification, 

{p}  $i  5 $2  {q}> 

where  Sj  and  S2  are  specifications,  p,  q are  formulas  from  WFFB,  and  S is  a specification 
which  does  not  contain  any  unknown  statement  specification,  it  follows  that  S1  and  S2 
also  do  not  contain  any  unknown  statement  specification.  By  the  induction  hypothesis,  Si 
and  S2  must  be  annotated  programs.  It  follows  by  definition  that  S is  an  annotated  pro- 
gram. 

c)  If  5 is  a conditional  specification, 

{p}  if  e then  Si  else  S2  fi  {q}, 

where  Sv  S2  are  specifications,  e is  a quantifier  free  formula  from  QFFB,  p,  q are  formulas 
from  WFFb,  and  S is  a specification  which  does  not  contain  any  unknown  statement 
specification,  it  follows  that  and  S2  also  do  not  contain  any  unknown  statement 
specification.  By  the  induction  hypothesis,  $2  and  S2  must  be  annotated  programs.  It  fol- 
lows by  definition  that  S is  an  annotated  program. 

d)  If  S is  while  specification, 

{p}  while  e do  Si  od  {q}, 

where  is  a specification,  e is  a quantifier  free  formula  from  QFFB,  p,  q are  formulas 
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from  WFFb,  and  S is  a specification  which  does  not  contain  any  unknown  statement 
specification,  it  follows  that  also  does  not  contain  any  unknown  statement 
specification.  By  the  induction  hypothesis,  Sx  must  be  an  annotated  program.  It  follows 
by  definition  that  S is  an  annotated  program. 

Intuitively,  specifications  which  contain  unknown  specifications,  may  not  fully  specify  pro- 
grams. We  can  consider  such  specifications  as  being  in  some  sense  “incomplete.”  On  the  other 
hand,  specifications  which  do  not  contain  any  unknown  specifications  are  not  “incomplete”,  but 
can  be  associated  with  a specific  program.  The  following  theorem  makes  these  ideas  precise. 

Theorem:  Let  (5,  C)  be  an  abstract  program.  If  S does  not  contain  any  unknown  specification 
and  C 7^  0,  then  I Cl  = 1. 

Proof:  The  proof  is  by  induction  on  the  specification  S. 

a)  If  5 is  an  assignment  statement  specification, 

{p}  * •=  * {q}, 

where  x is  a variable  from  V,  t is  a term  from  TB  and  p,  q are  formulas  from  WFFB,  then 
C = {*•’=  t G I Th(J)  | {p}  x :=  t {q}  }. 

Since  C ^0,  there  exists  a W £ C and  W is  x ;=  L It  follows  that  I C I = 1. 

b)  If  S is  a composed  specification, 

{p}  ; S2  {q}, 

where  and  S2  are  specifications,  p,  p1(  p2,  q,  qj,  q2  are  formulas  from  WFFB,  p1?  qx  are 
the  pre-  and  post-conditions  associated  with  S»  p2,  q2  are  the  pre-  and  post-conditions 
associated  with  52,  p,  q are  the  pre-  and  post-conditions  associated  with  5,  and  5 is  a 
specification  which  does  not  contain  any  unknown  statement  specification,  it  follows  that 
$i  and  $2  also  do  not  contain  any  unknown  statement  specification.  Let 
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Ot  = { w € L&  I Th(J)  \-Sl  {Pl}  w {*}  }. 

Let 

C2  = { W € L$  I Th(  J)  \-5'  {p2}  W {q2}  }. 

Since  C # 0,  both  Cx  and  C2  ^ 0.  By  the  induction  hypothesis,  it  follows  that  IcJ  = 1 
and  | C2I  = 1.  If  W € C,  then  W is  ; W2  for  Wx  G and  W2  € C2.  Therefore  Id  = 
1. 


c)  If  S is  a conditional  specification, 

{p}  if  e then  Sx  else  S2  fi  {q}, 

where  and  S2  are  specifications,  e is  a quantifier  free  formula  from  QFFB,  p,  p2,  p2,  q, 
qx,  q2  are  formulas  from  WFFB,  Pl,  qx  are  the  pre-  and  post-conditions  associated  with 
Si,  p2,  q2  are  the  pre-  and  post-conditions  associated  with  S2,  p,  q are  the  pre-  and 
post-conditions  associated  with  $,  and  5 is  a specification  which  does  not  contain  any 
unknown  statement  specification,  it  follows  that  5!  and  S2  also  do  not  contain  any  unk- 
nown statement  specification.  Let 

Ct  = { W € L$  I Th(J)  H*1  (Pil  W {qi}  }. 

Let 

c2  = { W € L$  I Th (2)  |-*  W W {q2}  }. 

Since  C /0,  both  Cx  and  C2  ^ 0.  By  the  induction  hypothesis,  it  follows  that  I Cj  = 1 
and  |C2i  = 1.  If  W 6 C,  then  W is 

if  e then  Wj  else  Ws  fi. 
for  6 Cx  and  W2  6 C2.  Therefore  |C  I = 1. 

d)  If  5 is  while  specification, 

{p}  while  e do  Si  od  {q}, 
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where  Sx  is  a specification,  e is  a quantifier  free  formula  from  QFFB,  p,  px,  q,  qx  are  for- 
mulas from  WFFB,  px,  qx  are  the  pre-  and  post-conditions  associated  with  Sx,  p,  q are  the 
pre-  and  post-conditions  associated  with  S,  and  S is  a specification  which  does  not  con- 
tain any  unknown  statement  specification,  it  follows  that  Sx  also  does  not  contain  any 
unknown  statement  specification.  Let 

c.  = { W 6 L&  I Th(i)  |-S‘  {p.}  w {q,}  (. 

Since  C ^ 0,  Cx  # 0.  By  the  induction  hypothesis,  it  follows  that  | Cx  I = 1.  If  W 6 C, 
then  W is 

while  e do  Wx  od 

for  Wx  £ Cx.  Therefore  |c|=  1. 

5.2.  A Special  Case  of  a Correct  Development  Step 

Initially,  we  consider  a somewhat  simplified  situation  in  which  we  wish  to  construct  a 
correct  development  step.  Let  us  consider  the  ordered  pair,  (5,  C)  for  which  S has  the  form, 

{p}  lq}> 

where  p and  q are  formulas  from  WFFB.  C is  the  set  of  while-programs,  W € L$,  for  which 
there  exists  a deduction  in  the  Hoare  calculus  from  the  theory  of  the  interpretation  of  the  predi- 
cate logic  to  the  Hoare  formula  {p}  W {q}  consistent  with  S;  that  is, 

C = { W <E  l£  I Th(J)  (-*  {p>  W {,}  }. 

From  the  abstract  program,  (5,  C),  we  construct  a new  abstract  program, 

(S’,  co, 

in  which  the  specification,  S',  and  the  set  of  while-programs,  C',  are  related  to  $ and  C.  The 
relationship  involves  the  transformation  of  $ by  changing  the  unknown  specification  into  a 
another  specification.  Using  the  notation  of  the  abstract  model,  we  have  a transformation  on  the 
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specifications, 

T:  S — S'. 

In  terms  of  the  example  of  the  formal  development  the  transformation  can  be  expressed  as 

T:  {p}  {q}  -*•  {p}  {q} 

where  € L®  is  either  an  assignment  statement  specification,  composed  statement  specification, 
conditional  statement  specification,  or  a while  statement  specification.  We  give  a formal 
definition  of  these  transformations  in  this  section. 

Let  S*  be  [p|  {q}.  C'  is  a set  of  while-programs  for  which  there  exists  a deduction  in  the 

Hoare  calculus  from  the  theory  of  the  interpretation  of  the  predicate  logic  to  the  Hoare  formula 
{p}  W {q}  consistent  with  S that  is,  C1  is 

{ W e L$  I Th(I)  \-ff  {p}W{q}}. 

We  assume  that  both  C and  C # 0.  This  is  an  assumption  that  there  exist  while-programs 
which  satisfy  the  specifications  5 and  S'.  Since  we  are  constructing  an  example  of  an  idealized 
development,  these  assumptions  are  reasonable  restrictions  on  the  specifications.  There  are  four 
possibilities  for  C,  depending  upon  the  four  kinds  of  transformation  from  {p } {q}  to  {p}  Sj  {q}. 
In  this  section  we  will  introduce  conditions  under  which  it  is  possible  to  guarantee  that  a while- 
program  W £ L$  is  in  C fl  C.  As  a consequence  of  these  conditions  being  satisfied,  for  each 
transformation,  T,  and  for  each  such  while-program  W,  W is  partially  correct  with  respect  to  S' 
and  S. 

Definition:  ( Specification  Transformations  — special  case  ) A transformation,  T,  from  a 
specification,  5,  which  is  an  unknown  statement  specification,  {p}  {q},  where  p,  q are  formulas 
from  WFFb,  to  another  specification,  S',  which  is  the  image  under  T,  of  S,  is  defined  as  follows: 

a)  Assignment  statement  transformation  If  * is  a variable  from  V,  and  t is  a term  from  TB, 
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then 

T:  {p}  fq}  -+  {p}  x :=  t {q}. 

b)  Composed  statement  transformation  If  p1(  p2,  qx,  q2  are  formulas  from  WFFB,  and  {px} 
{qj  and  (p2}  {q2}  are  specifications,  then 

T : {p}  {q>  - {p}  {Pl}  {qj  ; {p2}  {q,}  {q}. 

c)  Conditional  statement  transformation  If  px,  p2,  qx,  q2  are  formulas  from  WFFB,  and  {px} 
(qil  an<^  {P2 } {q2l  are  specifications,  and  e is  a quantifier  free  formula  from  QFFB,  then 

T : {p}  {q}  — *■  iPl  */e  then  {pj  {q1}  else  {p2}  {q2}  fi  {q}. 

d)  While  statement  transformation  If  Pl,  qx  are  formulas  from  WFFB,  {Pl}  {qx}  is  a 
specification,  and  e is  a quantifier  free  formula  from  QFFB,  then 

T : {Pl  {q}  -*•  {p}  while  e do  {px}  {qx}  od  {q}. 

We  note  that  the  pre-  and  post-conditions  associated  with  both  S and  S'  are  p and  q.  Thus, 
the  transformation, 

T:  S — ► 

preserves  pre-  and  post-conditions. 

The  four  lemmas  which  follow  give  conditions  under  which  it  is  possible  to  have  derivations 
of  specific  kinds  of  Hoare  formulas.  Each  of  these  Hoare  formulas  is  closely  related  to  one  of  the 
four  kinds  of  specification  transformations.  We  call  these  conditions  proof  rules,  since  they  are 
sufficient  to  guarantee  the  existence  of  derivations  in  the  Hoare  calculus  which  will  lead  to  a 
correct  development  step. 

Lemma:  ( Assignment  Statement  Derivation  ) Let  T:  S -*•  S'  be  an  assignment  statement 
transformation, 
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T:  {p}  {q}  - {p}  * •'=  t {q}. 

Let  W € Suppose  that  W is  x t for  some  x 6 V and  t 6 TB.  Let  p,  q be  formulas  from 
WFFb  and  let  {p}  { q}  be  a specification  from  L$.  Furthermore,  assume  that  there  exists  a 
derivation  of  the  following  formula  from  the  theory  of  the  interpretation  1: 

a)  P =>  q*- 

Then  W £ C n C'. 

Proofs  We  first  prove  that  W £ C.  Since  p =>  q^,  it  is  a consequence  of  the  derived  rule  that 
{p}  x :=  t {q};  that  is, 

Th(J)  h-  {p}  * •=  * {q}- 

Therefore  W 6 C. 

If  the  following  two  conditions  are  satisfied 

i)  W is  x t 

ii)  Th(J)|-  {p}W  {q} 

then 

Th (I)  H5*  {p}  W {q} 

and  W £ C'.  Condition  i)  holds  by  assumption.  Condition  ii)  is  a consequence  of  a). 

Definition:  ( Assignment  Statement  Proof  Rule  — special  case  ) Let  T,  W,  x,  t,  p,  q,  J,  and 
condition  a)  be  as  in  the  preceding  lemma.  Then  a)  is  called  an  assignment  statement  proof  rule. 

The  preceding  lemma  shows  that  partial  correctness  with  respect  to  specifications  is 
preserved  by  assignment  statement  transformations  if  the  assignment  statement  proof  rule  holds. 

Lemma:  ( Composed  Statement  Derivation ) Let  T:  S ► S'  be  a composed  statement  transfor- 

mation, 
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T:  {p}  {q}  - {p}  {Pl}  {qx}  ; {p2}  {q2}  {q}. 

Let  W G Lw*  Suppose  that  W is 

WX;W2 

for  some  Wx,  W2  G L$*  Let  p,  px,  p2,  q,  qlf  q2  be  formulas  from  WFFB,  and  {p}  {q},  {px}  {qx}, 
and  {p2}  {q2}  be  specifications  from  L®.  Furthermore,  assume  that  there  exists  a derivation  of 
the  following  formulas  from  the  theory  of  the  interpretation  I: 

a)  P =*>  Pi 

b)  <h  =>  P2 

c)  q2  =>  q 

d)  {px|  Wx  {qx|  for  some  Wx  G L$ 

e)  {P2}  W2  W2}  for  some  w2  € L$. 

Then  W G C D C'. 

Proof:  From  the  formulas  qx  =£>  qx,  p =$>  px,  and  {p}  Wx  {q},  it  follows  from  rule  (v)  that  {p} 
Wx  {qx}.  Similarly,  from  qx  =>  p2,  q2  =>  q,  and  {p2}  W2  {q2},  it  follows  from  rule  (v)  that  {qx} 
W2  {q}.  From  {p}  Wx  {qx}  and  {qx}  W2  {q}  it  follows  from  rule  (ii)  that  {p}  Wx  ; W2  {q};  that  is, 

Th(J)  K {p}  Wx  ; W2  (q). 

It  follows  that  W G C. 

Let  Sx  be  |px}  |qx}  and  S2  be  (p2}  {q2}.  If  the  following  hold 

i)  W is  Wx  ; W2  for  some  Wx,  W2  € L# 

ii)  Th(J)  h {pi  W |q} 
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iii)  Th(J)  hSl  {Pi}  Wj  {qi} 

iv)  Th( I)  h5’  {p2}  W2  {q2} 
then 

Th(J)  f-5'  {p}  W {q} 

and  W € C'.  Condition  i)  holds  by  assumption.  Condition  ii)  is  a consequence  of  a)  - e).  Condi- 
tion iii)  follows  from  d)  and  the  fact  that 

Th (I)  {Pi>  wx  {qi} 

is 

Th(J)  h {Pi}  W2  {qi}. 

Condition  iv)  follows  from  e)  and  the  fact  that 

Th(J)  1-S*  {p2}  W2  {q2| 
is 

Th  (I)  H {p2|  W2  {q2}. 

Definition:  ( Composed  Statement  Proof  Rules  — special  case  ) Let  T,  W,  Wu  W2,  p,  px,  p2, 
q,  qj,  q2,  I,  and  conditions  a)  - e)  be  as  in  the  preceding  lemma.  Then  a)  - e)  are  called  composed 
statement  proof  rules. 

The  preceding  lemma  shows  that  partial  correctness  with  respect  to  specifications  is 
preserved  by  composed  statement  transformations  if  the  composed  statement  proof  rules  hold. 

Lemma:  ( Conditional  Statement  Derivation  ) Let  T:  S — ► S'  be  a conditional  statement 

transformation, 

T:  {p}  {q}  -*•  {p}  if  e then  {px}  {qx}  else  {p2}  {q2}  fi  {q}. 

Let  W £ L$.  Suppose  that  W is 

if  e then  Wx  else  W 2 fi 
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for  some  quantifier  free  formula  e from  QFFB  and  some  Wif  W2  G L$.  Let  p,  plf  p2,  q,  q1(  q2  be 
formulas  from  WFFB,  and  let  {p}  {q},  {pj  {qi},  and  {p2}  {q2>  be  specifications  from  L®.  Furth- 
ermore, assume  that  there  exists  a derivation  of  the  following  formulas  from  the  theory  of  the 
interpretation  I: 

a)  p A e =£>  pj 

b)  Qi  =%>  q 

c)  p A ->  e =$>  p2 

d)  q2  q 

e)  (Pi>  W2  {qj 

0{P2}W2{q2K 

Then  W G C n C'. 

Proof:  Since  p A “»  e =$>  p2,  {p2}  W2  {q2},  and  q2  =>  q,  it  follows  from  rule  (v)  that  {p  A -»  e} 
W2  {q}-  Similarly,  fp  A e}  Wx  {q}  follows  from  p A e =£>  p1?  {px}  Wx  {q1}J  qi  =>  q,  and  rule  (v). 
Using  the  fact  that  {p  A e}  Wx  {q}  and  {p  A e}  W2  {q},  it  follows  from  rule  (iii)  that 

{p}  t/efAenWj  else  W2  {q}; 

that  is, 

Th(i)  1-  {p}  if  e then  W t else  W2  {q}. 

Therefore  W G C. 

Let  Si  be  {px}  {qx}  and  S2  be  {p2}  {q2}.  If  the  following  four  conditions  hold 
i)  W is  if  e then  Wx  else  W2  fi  for  some  Wx,  W2  G L$ 
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ii)  Th(J)  h lp}  W {q} 

iii)  Th (I)  hSl  {Pi}  wx  {qx} 

iv)  Th (I)  |-Sl  (p2|  W2  {q2} 
then 

Th(J)  | ^ {p}  W {q} 

and  W G C\  Condition  i)  holds  by  assumption.  Condition  ii)  is  a consequence  of  a)  - f).  Condi- 
tion iii)  follows  from  e)  and  the  fact  that 

Th(i)  l-5"  (Pi)  w,  {q,} 
is 

Th(J)H{Pi}W1{q1}. 

Condition  iv)  follows  from  f)  and  the  fact  that 

Th (J)  1-S*  {p2}  W2  {q2} 
is 

Th(i)  I—  {P2}  W2  {q2}. 

Definition:  ( Conditional  Statement  Proof  Rules  — special  case  ) Let  T,  W,  Wx,  W2,  p,  p1(  p2, 
q,  qx,  q2,  e,  I,  and  conditions  a)  - f)  be  as  in  the  preceding  lemma.  Then  a)  - f)  are  called  condi- 
tional statement  proof  rules. 

The  preceding  lemma  shows  that  partial  correctness  with  respect  to  specifications  is 
preserved  by  conditional  statement  transformations  if  the  conditional  statement  proof  rules  hold. 

Lemma:  ( While  Statement  Derivation  ) Let  T:  $ — ► S'  be  a while  statement  transformation, 

T:  {p}  {q}  -*•  {p}  while  e do  {px}  {qx}  od  {q}. 

Let  W G L^.  Suppose  that  W is 
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while  e do  Wj  od 

for  some  quantifier  free  formula  e from  QFFB,  and  some  Wx  G L&.  Let  p,  px,  q,  qx  be  formulas 
from  WFFb,  and  let  {p}  {q},  and  {px } {qx}  be  specifications  from  L®.  Furthermore,  assume  that 
there  exists  a derivation  of  the  following  formulas  from  the  theory  of  the  interpretation  J: 

a)  p A -i  e =£>  q 

b)  p A e =£>  px 

c)  <h  =>  P 

<0  (Pi)  Wj  (qi)  for  some  Wx  G L$. 

Then  W G C fl  C'. 

Proof:  We  first  prove  that  W G C.  From  the  formulas  p A e =>  pu  qx  =*>  p,  and  {pj  {qj, 
it  follows  from  rule  (v)  that  {p  A e}  Wj  {p}.  From  {p  A e}  Wx  {p}  and  rule  (iii)  we  obtain 

{p}  while  e do  od  {p  A ”•  e}. 

Since  p =>  p,  p A i e =$>  q,  and  {p}  while  e do  Wj  od  (p  A ~ > e},  it  follows  from  rule  (v)  that 

{p}  while  e do  od  {q}. 

Therefore,  we  have 

Th(J)  |-  {p}  while  e do  od  {q}. 

It  follows  that  W G C. 

Let!5j  be  {pj}  {qj}.  If  the  following  hold 

i)  W is  while  e do  Wj  od  for  some  Wj  G L$. 

ii)  Th(J)  I-  {p}  W {,} 
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iii)  Th(J)  h51  (Pi)  Wx  |qi} 
then 

Th(J)  I-5*  {p}  W {q} 

and  W € C'.  Condition  i)  holds  by  assumption.  Condition  iii)  follows  from  d)  and  the  fact  that 

Th(J)  HS‘  {Pi!  W,  {q,} 

is 

Th(J)  f-  {Pl}  Wx  {qj. 

Condition  ii)  is  a consequence  of  a)  - d). 

Definition:  ( While  Statement  Proof  Rules  — special  case  ) Let  T,  W,  Wlf  e,  p,  plf  q,  q1,  J, 
and  conditions  a)  - d)  be  as  in  the  preceding  lemma.  Then  a)  - d)  are  called  while  statement 
proof  rules . 

The  preceding  lemma  shows  that  partial  correctness  with  respect  to  specifications  is 
preserved  by  while  statement  transformations  if  the  while  statement  proof  rules  hold. 

Definition:  ( Proof  Rules  — special  case  ) The  assignment  statement,  composed  statement, 

conditional  statement,  and  while  statement  proof  rules  are  called  proof  rules  (for  the  specification 

Ip)  {q})- 

We  combine  the  results  of  the  lemmas  of  this  section  to  obtain  the  following  two  theorems. 

Theorem:  ( Development  Step  — special  case  ) Let  S be  an  unknown  statement  specification, 

jp}  |q}.  Let  T be  any  one  of  the  four  possible  kinds  of  transformations, 

T:  S -+  S', 

such  that  5'  is  the  specification,  {p}  Si  {q},  and  Sj  is  either  an  assignment  statement  specification, 
a composed  statement  specification,  a conditional  statement  specification,  or  a while  statement 
specification.  Let  C be 
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{ W e L&  I Th(J)  \-s  {p>W{q}} 

and  let  C'  be 

{W6L^ITh(i)|-5’  {p}W{q}  }. 

Assume  that  C ^ 0 and  C'  ^ 0 and  that  the  proof  rules  (for  the  specification  {p}  {q})  hold. 
Then  (S,  C)  and  (S',  C')  are  abstract  programs  and  for  some  W E L$,  W G C f|  C'. 

Proof:  From  the  theorem  of  section  4.2  it  follows  that  ( S , C),  (S',  C')  are  abstract  programs. 
The  fact  that  W G C D C'  for  some  W E L$  follows  from  the  four  preceding  lemmas. 

Theorem:  ( Development  Step  Correctness  — special  case  ) Let  S be  an  unknown  statement 

specification,  {p}  {q}.  Let  T be  any  one  of  the  four  possible  kinds  of  transformations, 

T:  S — ► S', 

such  that  S'  is  the  specification,  {p}  Sl  {q},  and  S}  is  either  an  assignment  statement  specification, 
a composed  statement  specification,  a conditional  statement  specification,  or  a while  statement 
specification.  Let  C be 

{ W E L$  I Th(  J)  |— 5 {p}  W {q}  } 

and  let  C'  be 

(WEL®  ITh(J)  I-*  {p}W{q}  }. 

Assume  that  C'  C C,  CV  0,  and  that  the  proof  rules  (for  the  specification  jp}  {q})  hold.  Then 
(S,  C)  and  (S',  C')  are  abstract  programs  and 

((S,  C),  (S',  C')) 

is  a correct  development  step. 

Proof:  From  the  preceding  theorem  (S,  C)  and  (S',  C')  are  abstract  programs.  Since  C'  is  a sub- 
set of  C and  C'  ^ 0,  the  theorem  follows  from  the  definition  of  a correct  development  step. 

In  section  5.5  we  prove  that  the  existence  of  a specification  transformation  implies  that  C'  C 
C,  not  only  for  the  class  of  specification  transformations  which  we  consider  in  this  section,  but 
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for  a more  general  class  of  specification  transformations. 

5.3.  Specification  Transformations 

Let  S € L{®}{q},  80  that  S is  a specification  which  contains  the  unknown  specification  {p}  {q}. 
The  specification  transformations,  which  we  define  next,  are  transformations  of  specifications, 
which  contain  unknown  specifications,  to  specifications.  These  are  a generalization  of  the 
specification  transformations  defined  for  unknown  specifications. 

Definition:  ( Specification  Transformations  — general  case  ) Let  S £ L^pj  and  let  p',  q'  be 
the  pre-  and  post-conditions  asssociated  with  S.  A transformation,  T,  of  the  specification,  S, 
which  is  a specification  containing  the  unknown  specification, 

{p}  {q}, 

where  p,  q are  formulas  from  WFFB,  to  another  specification,  S',  which  is  the  image  of  S under  T 
is  defined  inductively  as  follows: 

a)  Assignment  statement  transformation  Let  x be  a variable  from  V,  and  t a term  from  TB. 

(i)  If  S is  the  unknown  specification, 

{p}  (q}» 

then  S'  is 

{p|  * •- 1 {q}- 

(ii)  If  S is  a composed  specification, 

{p'}  * ; $2  {q'}, 

for  some  specifications  $lt  S2  from  L®,  then  either  St  or  S2  contains  the  unknown 
specification, 

{p}  {ql- 

If  St  contains  {p}  {q},  then  by  the  induction  hypothesis,  there  exists  an 
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assignment  statement  transformation, 

Ti:  Sj  - 

Define  T as  an  extension  of  Tx  from  to  $ as  follows: 

T:  {p'l  Sx  ; $2  {q'}  - {?'}  TX(SX) ; $2  {q'}. 

If  S2  contains  {p}  {q},  then  by  the  induction  hypothesis  there  exists  an  assignment 
statement  transformation  T2  on  S2.  Let  the  transformation  T on  S be  defined  as 
the  extension  of  the  transformation  T2  on  $2  to  S. 

(iii)  If  S is  a conditional  specification, 

{p'j  if  e then  $x  else  S2  fi  {q'}, 

for  some  quantifier  free  formula  e from  QFFB,  and  some  specifications  $lf  S2  from 
L ® , then  either  $x  or  S2  contains  the  unknown  specification, 

{p}  {q}- 

If  Sx  contains  {p}  {q},  then  by  the  induction  hypothesis,  there  exists  an  assign- 
ment statement  transformation, 

Tx:  5X  - S,'. 

Define  T as  an  extension  of  Tx  from  5X  to  S as  follows: 

T:  {p'}  if  e then  5X  else  S2  fi  {q'}  — {p'}  ,/e  then  else  S2  fi  {q'}. 

If  S2  contains  {p}  {q},  then  by  the  induction  hypothesis  there  exists  an  assignment 
statement  transformation  T2  on  S2.  Let  the  transformation  T on  $ be  defined  as 
the  extension  of  the  transformation  T2  on  S2  to  S. 

(iv)  If  S is  a while  specification, 

{p'}  while  e do  Sl  od  {q'}, 

for  some  quantifier  free  formula  e from  QFFB,  and  some  specification  5X  from  Lf, 
then  contains  the  unknown  specification, 
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{p|  (ql- 

By  the  induction  hypothesis,  there  exists  an  assignment  statement  transformation, 

Ti:  - S,'. 

Define  T as  an  extension  of  Tx  from  Sx  to  $ as  follows: 

T:  {p'|  while  e do  Sx  od  |q'}  — ► {p'}  while  e do  T1(51)  od  {q'}. 

b)  Composed  statement  transformation  Let  plf  p2,  qx,  q2  be  formulas  from  WFFB,  and  let 
{px}  Wil  and  {p2}  {q2}  be  specifications  from  L®.  This  part  is  similar  to  part  a)  except 
that  the  basis  for  the  induction  is  the  composed  statement  transformation, 

T:  {p}  {q}  -*>  {p}  {px}  {qx}  ; {p2}  {q2}  {q}- 

c)  Conditional  statement  transformation  Let  px,  p2,  qx,  q2  be  formulas  from  WFFB,  let  {px} 
{qx}  and  {p2}  {q2}  be  specifications  from  L®,  and  let  ex  be  a quantifier  free  formula  from 
QFFb.  This  part  is  similar  to  part  a)  except  that  the  basis  for  the  induction  is  the  condi- 
tional statement  transformation, 

T:  {p}  {q}  -*•  {p}  «/ex  then  {px}  {qx}  else  {p2}  {q2}  fi  {q}. 

d)  While  statement  transformation  Let  px,  qx  be  formulas  from  WFFB,  let  {px}  {qx}  be  a 
specification  from  L®,  and  let  ex  be  a quantifier  free  formula  from  QFFB.  This  part  is 
similar  to  part  a)  except  that  the  basis  for  the  induction  is  the  while  statement  transfor- 
mation, 

T:  (Pi  {q}  “*  {P)  do  {Pl}  {qj}  od  {q}. 

The  definition  just  given  for  specification  transformations  is  not  quite  precise  enough,  since 
we  really  need  a definition  which  defines  a unique  specification  transformation  for  each  occurrence 
of  the  unknown  specification, 
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(p>  (q}> 

in  the  specification  S.  One  way  to  handle  this  is  to  distinguish  between  the  occurrences  of  the 
unknown  specification  in  S.  For  example,  if  there  were  n occurrences  of  the  specification, 

{p}  {q}, 

label  them  {p1(l  {qj},  {p2}  {q2}>  •••,  {pn}  {qn}.  For  each  i,  1 < i < n,  a specification  transforma- 
tion of  S is  defined  using  the  preceding  definition,  where  S contains  a single  occurrence  of  the 
unknown  specification, 

{Pil  {qii- 


We  note  that  if  p*  and  q*  are  the  pre-  and  post-conditions  associated  with  the  specification 
transformation, 

T:  S S', 

then  the  pre-  and  post-conditions  associated  with  S'  are  also  p'  and  q'. 


5.4.  The  General  Case  for  Transformation  Proof  Rules 


In  this  section  we  generalize  the  notion  of  proof  rules  for  the  unknown  specification, 

{p}  {q}, 

to  proof  rules  for  specifications  which  contain  the  unknown  specification  {p}  (q).  The  definitions 
for  each  of  the  four  kinds  of  specification  transformations  are  inductive  and  all  are  very  similar 
to  one  another.  We  include  the  definitions  for  proof  rules  for  each  kind  of  specification  transfor- 
mation for  the  sake  of  completeness. 


Definition:  ( Assignment  Statement  Proof  Rule  — general  case  ) Let  $ be  a specification  from 
L®  with  pre-  and  post-conditions  p'  and  q'.  Suppose  that  S contains  the  unknown  specification, 


{p}  {q}- 


The  assignment  statement  proof  rule  (for  the  specification  {p}  {q}^  holds  for  $ is  defined  induc- 
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tively,  the  induction  being  on  the  specification  S. 

Basis 

a)  If  $ is  the  unknown  specification, 

{p}  {q  I, 

then,  if  the  assignment  statement  proof  rule  (for  the  specification  {p}  {q})  holds,  the 
assignment  proof  rule  holds  for  S. 

Induction  step 

b)  If  S is  the  composed  specification, 

{p'}  $1  ; S2  {q'}, 

for  some  specifications  51(  S2  from  L®,  then  either  Si  or  S2  G L|pj  |qj.  Assume  that  Sj  G 
L^pj  If  there  exists  an  assignment  statement  specification  transformation, 

TV  Sx  - Si1, 

such  that  the  assignment  statement  proof  rule  holds  for  then  the  assignment  state- 
ment proof  rule  holds  for  S.  If  S2  G |qj,  then  the  definition  is  similar. 

c)  If  S is  the  conditional  specification, 

{p'}  if  e then  Si  else  S2  fi  {q'|, 

for  some  quantifier  free  formula  e from  QFFB,  and  for  some  specifications  Sv  S2  from  L®, 
then  either  Si  or  S2  G L^pj  |qj.  Assume  that  G Lqpj  ^qj.  If  there  exists  an  assignment 
statement  specification  transformation, 

Tj:  ^ - $i', 

such  that  the  assignment  statement  proof  rule  holds  for  $v  then  the  assignment  state- 
ment proof  rule  holds  for  S.  If  S2  G L{pj  |qj,  then  the  definition  is  similar. 
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d)  If  S is  the  while  specification, 

Ip'}  while  e do  51  od  {q'}, 

for  some  quantifier  free  formula  e from  QFFB,  and  for  some  specification  Sx  from  L®, 
then  $x  € L(pj  |qj.  If  there  exists  an  assignment  statement  specification  transformation, 

Tx:  - SJ, 

such  that  the  assignment  statement  proof  rule  holds  for  then  the  assignment  state- 
ment proof  rule  holds  for  S. 

Definition:  ( Composed  Statement  Proof  Rules  — general  case  ) Let  5 be  a specification  from 
L®  with  pre-  and  post-conditions  p'  and  q'.  Suppose  that  S contains  the  unknown  specification, 

{p}  {q}- 

The  composed  statement  proof  rules  (for  the  specification  {p}  {q}j  hold  for  S is  defined  induc- 
tively, the  induction  being  on  the  specification  S. 

Basis 

a)  If  5 is  the  unknown  specification, 

{p}  {q}, 

then,  if  the  composed  statement  proof  rules  (for  the  specification  {p}  {q})  hold,  the  com- 
posed proof  rules  hold  for  S. 

Induction  step 

b)  If  S is  the  composed  specification, 

{p'S  Si  1 S,  W(, 

for  some  specifications  $v  S2  from  L®,  then  either  or  S2  € L(pj  |qj.  Assume  that  £ 
L{pj  |qj.  If  there  exists  an  composed  statement  specification  transformation, 

Tx:  St  - St\ 
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such  that  the  composed  statement  proof  rules  hold  for  Sv  then  the  composed  statement 
proof  rules  hold  for  S.  If  S2  G Ljpj  |qj,  then  the  definition  is  similar. 

c)  If  S is  the  conditional  specification, 

{p'|  if  e then  S1  else  S2  fi  {q'}, 

for  some  quantifier  free  formula  e from  QFFB,  and  for  some  specifications  Sv  S2  from  L®, 
then  either  Sx  or  S2  G L|pj  |qj.  Assume  that  6 L|p j If  there  exists  an  composed 
statement  specification  transformation, 

T,:  Si  - S,', 

such  that  the  composed  statement  proof  rules  hold  for  Sv  then  the  composed  statement 
proof  rules  hold  for  S.  If  S2  €=  L|pj  then  the  definition  is  similar. 

d)  If  S is  the  while  specification, 

{p*}  while  e do  od  {q'}, 

for  some  quantifier  free  formula  e from  QFFB,  and  for  some  specification  from  L®, 
then  si  e l{p}  {q}.  If  there  exists  an  composed  statement  specification  transformation, 

Txs  Sx  - V, 

such  that  the  composed  statement  proof  rules  hold  for  Sv  then  the  composed  statement 
proof  rules  hold  for  S . 

Definition:  ( Conditional  Statement  Proof  Rules  — general  case  ) Let  S be  a specification  from 
L®  with  pre-  and  post-conditions  p'  and  q'.  Suppose  that  $ contains  the  unknown  specification, 

{p}  {q}- 

The  conditional  statement  proof  rules  (for  the  specification  {p}  {q})  hold  for  $ is  defined  induc- 
tively, the  induction  being  on  the  specification  5. 

Basts 
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a)  If  S is  the  unknown  specification, 

{p I {q}» 

then,  if  the  conditional  statement  proof  rules  (for  the  specification  {p}  {q})  hold,  the  con- 
ditional proof  rules  hold  for  S. 

Induction  step 

b)  If  S is  the  composed  specification, 

{P'}  ; 52  |q'}, 

for  some  specifications  Sx,  S2  from  L® , then  either  Sj  or  S2  € Lqpj  Assume  that  Sx  E 
L{p}  {qj-  If  there  exists  an  conditional  statement  specification  transformation, 

TV  5x  - V, 

such  that  the  conditional  statement  proof  rules  hold  for  Sj,  then  the  conditional  state- 
ment proof  rules  hold  for  S.  If  S2  E L|pj  then  the  definition  is  similar. 

c)  If  S is  the  conditional  specification, 

{p'}  if  e then  Sx  else  S2  fi  {q'}, 

for  some  quantifier  free  formula  e from  QFFB,  and  for  some  specifications  Sv  S2  from  L|, 
then  either  Sx  or  S2  € L^pj  |qj.  Assume  that  E Lqpj  |qj.  If  there  exists  an  conditional 
statement  specification  transformation, 

Ti:  - V, 

such  that  the  conditional  statement  proof  rules  hold  for  Sv  then  the  conditional  state- 
ment proof  rules  hold  for  S.  If  S2  E lqpj  |qj,  then  the  definition  is  similar. 

d)  If  S is  the  while  specification, 

|p'}  while  e do  Sx  od  {q'}, 
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for  some  quantifier  free  formula  e from  QFFB,  and  for  some  specification  from  L®, 
then  Si  € L{p}  {q).  If  there  exists  an  conditional  statement  specification  transformation, 

Tx:  Sx  - V, 

such  that  the  conditional  statement  proof  rules  hold  for  then  the  conditional  state- 
ment proof  rules  hold  for  S. 

Definition:  ( While  Statement  Proof  Rules  — general  case  ) Let  S be  a specification  from  L® 
with  pre-  and  post-conditions  p'  and  q'.  Suppose  that  5 contains  the  unknown  specification, 

{p}  (q|- 

The  while  statement  proof  rules  (for  the  specification  {p}  (q}J  hold  for  $ is  defined  inductively,  the 
induction  being  on  the  specification  S. 

Basis 

a)  If  S is  the  unknown  specification, 

{p}  {q}> 

then,  if  the  while  statement  proof  rules  (for  the  specification  {p}  {q})  hold,  the  while  proof 
rules  hold  for  5. 

Induction  step 

b)  If  S is  the  composed  specification, 

(p'}  $i  5 S2  {q'|, 

for  some  specifications  Sv  S2  from  Lf , then  either  St  or  S2  <E  L{p}  Assume  that  5X  £ 
L{p}  {q}-  If  there  exists  an  while  statement  specification  transformation, 

Tj:  St  -+  Si, 

such  that  the  while  statement  proof  rules  hold  for  $lf  then  the  while  statement  proof  rules 
hold  for  5.  If  S2  G L(pj  ^qj,  then  the  definition  is  similar. 


87 


July  29,  1986 


DRAFT 


c)  If  $ is  the  conditional  specification, 

{p'|  if  e then  Sx  else  S2  fi  {q'|, 

for  some  quantifier  free  formula  e from  QFFB,  and  for  some  specifications  Sv  S2  from  if, 
then  either  or  S2  € L{p}  {qj.  Assume  that  Sj  € L|pj  If  there  exists  an  while  state- 
ment specification  transformation, 

Tx:  $1  ~ *•  Si, 

such  that  the  while  statement  proof  rules  hold  for  then  the  while  statement  proof  rules 
hold  for  S . If  S2  € L^pj  ^qj,  then  the  definition  is  similar. 

d)  If  S is  the  while  specification, 

{p'}  while  e do  Sx  od  {q'}, 

for  some  quantifier  free  formula  e from  QFFBl  and  for  some  specification  from  L®, 
then  € L{pj  {qj.  If  there  exists  an  while  statement  specification  transformation, 

Ti:  Si  — ► Si, 

such  that  the  while  statement  proof  rules  hold  for  Slf  then  the  while  statement  proof  rules 
hold  for  S. 

Definition:  ( Proof  Rules  — general  case  ) Let  S be  a specification  from  L®  with  pre-  and 
post-conditions  p'  and  q'.  Suppose  that  S contains  the  unknown  specification, 

{p|  {q}- 

If  there  is  an  assignment  statement  (composed  statement,  conditional  statement,  while  state- 
ment, respectively)  transformation  on  S such  that  the  assignment  statement  (composed  state- 
ment, conditional  statement,  while  statement,  respectively)  proof  rules  (for  the  specification  {p} 
}q})  hold  for  S,  then  the  proof  rules  (for  the  specification  {p}  {q}^  hold  for  $. 
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5.5.  Sets  of  Implementations  Related  by  S^t  Inclusion 

Let  (S,  C)  be  an  abstract  program  and  let  (S',  C')  be  the  the  abstract  program  obtained 
from  (5,  C)  by  a specification  transformation  T from  S to  S'.  In  this  section  we  prove  that  the 
sets  C and  C'  have  the  property  that  C 1 C C.  This  set  inclusion  relation  on  the  implementations 
is  one  of  the  requirements  for  a correct  development  step  in  the  abstract  model.  This  set  inclu- 
sion relation  is  an  immediate  consequence  of  the  theorem  which  we  prove  in  this  section.  The 
theorem  requires  four  lemmas  and  each  lemma  depends  upon  the  kind  of  specification  transfor- 
mation, T,  which  is  used  to  transform  S to  S'.  Even  though  the  proof  of  each  of  the  lemmas  is 
rather  involved  due  to  the  induction  on  the  specifications,  the  basic  idea  for  the  proofs  is  simple. 
Each  proof  can  be  summarized  as  follows:  Any  while-program  which  is  partially  correct  with 
respect  to  a given  specification  must  also  be  partially  correct  with  respect  to  a less  detailed 
specification,  which  is  consistent  with  the  given  specification. 

Lemma:  ( Correctness  of  Assignment  Statement  Implementations  ) Let  T be  an  assignment 

statement  transformation  of  the  specification  S , which  contains  the  unknown  specification, 

{p}  {q}> 

where  p,  q are  formulas  from  WFFB.  Let  S'  be  the  image  of  5 under  T and  let  p',  q'  be  the  pre- 
and  post-conditions  associated  with  S and  S'.  Let  C be 

{W€L®  ITh(J)H5{P,}W{q'}} 

and  let  C be 

{W€L^ITh(J)|-,{p,}W{q'}}. 

For  each  W E C',  W € C;  that  is,  C'  C C. 

Proofs  The  proof  is  by  induction  on  the  specification  5.  Associated  with  the  specific  assignment 
statement  transformation  T is  a variable  x E V,  and  a term  t from  TB. 
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a)  If  S is  the  unknown  specification, 

{p} 

then  S'  is 

{p}  * •=  t {q}- 

If  W € C',  then 

Th(J)  h-5*  {p}  W {q}. 

It  follows  that 

(i)  W is  x ;=  t. 

(ii)  Th (J)  |-  {p}  W {q}. 

Conditions  (i)  and  (ii)  imply  that 

Th(J)  |— 5 {p}W{q} 

or  W e C. 


b)  If  5 is  a composed  specification, 

(p'l  $i ; $2  {q'}, 

for  some  specifications  51(  S2  from  L®,  then  either  Sx  or  S2  G L{pj  If  Sl  G L^pj  then 
S' is 


{p'l  Si  ; 52  {q'}, 

where  S2  is  the  specification  which  is  the  image  of  Sj  under  an  assignment  statement 
transformation, 


Tx:  S,  - V. 

Let  W £ C'.  Since 

Th(J)  I-1’  {P'l  W {q<}, 

it  follows  that  for  some  pre-  and  post-conditions  px,  q1?  and  p2,  q2  associated  with  Sx  and 
S*  respectively,  that 
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(i)  W is  ; W2  for  some  Wx,  W2  G L$. 

(ii)  Th(J)  h {p'|  W }q'}. 

(iii)  Th(i)  | {p,}  Wj  {q,|. 

(iv)  Th(J)  I-*'  (Pi)  W,  {q,}. 

Using  the  induction  hypothesis,  if  Wj  G L$  satisfies  (iii),  then 

(v)  Th(T)  |-S‘  {p,(  w,  {q,}. 

It  follows  from  (i),  (ii),  (iv),  and  (v)  that 

Th (I)  \-s  {p'}  W {q'} 

or  W G C.  If  we  assume  that  S2  G Ljpj  |qj,  then  the  proof  is  similar. 

c)  If  5 is  a conditional  specification, 

{p'}  if  e then  Sx  else  S2  fi  (q'}, 

for  some  quantifier  free  formula  e from  QFFB,  and  some  specifications  Sv  S2  from  L®, 
then  either  Sx  or  S2  G L^pj  |qj.  If  G L{pj  {qj,  then  S'  is 

{p'}  if  e then  Sx  else  S2  fi  {q'|, 

where  Sx  is  the  specification  which  is  the  image  of  Sx  under  an  assignment  statement 
transformation, 

Tx:  Sx  -+  $x. 

Let  W G C.  Since 

Th(J)  h'  {P'}  W {q*}, 

it  follows  that  for  some  pre-  and  post-conditions  pt,  qx,  and  p2,  q2  associated  with  Sx  and 
^2i  respectively,  that 

(i)  W is  t/e  then  Wx  else  W2  fi  for  some  Wx,  W2  G L$. 

(ii)  Th(I)  | — (p'l  W {q'}. 


71 


July  29,  1986 


DRAFT 


.(iii)  Th(J)  I— *1'  {pa>  Wx  {qx}. 

(iv)  Th(J)  I—5*  ,rp2|  W2|q2J. 

Using  the  induction  hypothesis,  if  Wj  6 L$  satisfies  (iii),  then 

(v)  Th(J)  HS‘  (p,1  w, 

It  follows  from  (i),  (ii),  (iv),  and  (v)  that 

Th(i)  fp'!  W W) 

°r  W G C,  If  we  assume  that  S2  € L|pj  {qj,  then  the  proof  is  similar. 


d)  If  5 is  a while  specification, 


{p'}  while  e do  Sx  od  {q;}, 

for  some  specification  Sx  £ L^pj  and  some  quantifier  free  formula  e from  QFFB,  then  S * 
is 


{p'}  while  e do  Sx  od  {q'}, 

where  is  the  specification  which  is  the  image  of  Sx  under  an  assignment  statement 
transformation, 


T,:  S,  - V- 

LetW€C'.  Since 

Th(J)  \~*  {p'l  W W), 

it  follows  that  for  some  pre—  and  post— conditions  p2,  q^  associated  with  5^  that 


(i)  W is  while  e do  Wx  od  for  some  Wj  6 L$. 

(ii)  Th(7)  h {p'}  W {q'}. 

(iii)  Th(7)  |-S‘'  (p.)  W,  (q,). 

Using  the  induction  hypothesis,  if  Wj  6 satisfies  (iii),  then 

(iv)  Th(J)  I-*1  (Pi)  {qj. 
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It  follows  from  (i),  (ii),  and  (iv)  that 

Th(J)  h5  {?'}  W {q'} 

or  W 6 C. 

Lemma:  ( Correctness  of  Composed  Statement  Implementations  ) Let  T be  an  composed  state- 
ment transformation  of  the  specification  $,  which  contains  the  unknown  specification, 

{p}  {q}, 

where  p,  q are  formulas  from  WFFB.  Let  S'  be  the  image  of  S under  T and  let  p',  q'  be  the  pre- 
and  post-conditions  associated  with  S and  S'.  Let  C be 

{W€L®  \Th(I)\-s  {p'}\V{q'}} 

and  let  Cf  be 

{W€L®  I Th(  J)  I—5*  {p'}W{q'}  }. 

For  each  W € C',  W <E  C;  that  is,  C'CC, 

Proof:  The  proof  is  by  induction  on  the  specification  S.  Associated  with  the  composed  state- 
ment transformation  T are  formulas  px,  p2,  qlt  q2  from  WFFB,  and  the  specifications,  {px}  {qx} 
and  |p2}  {q2},  from  L®. 

a)  If  5 is  the  unknown  specification, 

{p}  {q}, 

then  S'  is 

{p}  ^1  5 S2  {q}, 

where  Sx  is  {px}  {qx}  and  S2  is  {p2}  {q2}.  If  W € C',  then 

Th (2)  h*  {p}  W {q}. 

It  follows  that 

(i)  W is  Wx  ; W2  for  some  Wx,  W2  6 L$. 
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(ii)  Th(J)  | — {p } W {q}. 

(iii)  Th(J)  h-^1  iPi}  Wx  {qi}. 

(iv)  Th(J)  (p2>  W2  |q2}. 

As  a consequence  of  (i)  - (iv),  it  follows  that 

Th(7)M{p}W(q} 

or  W E C. 

b)  If  S is  a composed  specification, 

{p#}  53 ; 54 

for  some  specifications  S3,  S4  from  LSB,  then  either  S3  or  S4  € L{p}  {q}.  If  S3  G L{p}  {q},  then 
S'  is 

(p'(  S3'  i S,  «}, 

where  S3'  is  the  specification  which  is  the  image  of  S3  under  a composed  statement 
transformation, 

T3:  53  — ► S3'. 

Let  W £ C'.  Since 

Th(J)  1— J {p->  W (q-), 

it  follows  that  for  some  pre-  and  post-conditions  p3t  q,,  and  p4,  q4  associated  with  S3  and 
respectively, 

(i)  W is  W3  ; W4  for  some  W3,  W4  E L$. 

(ii)  Th(J)  I-  (p'}  W (,-). 

(iii)  Th(J)  (Ps)  W3  (q,). 

(iv)  Th(J)  HS‘  {p4}  w4  (q,). 

Using  the  induction  hypothesis,  if  W3  E satisfies  (iii),  then 
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(v)  Th(J)  \—S*  {p3}  W3  {q3J. 

It  follows  from  (i),  (ii),  (iv),  and  (v)  that 

Th(J)h5{p'}W{q'}. 

or  W £ C.  If  we  assume  that  S4  £ L|pj  {qj,  then  the  proof  is  similar. 


c)  If  5 is  a conditional  specification, 

(p'|  t/ex  then  S3  else  S4  fi  {q'}, 

for  some  specifications  S3,  S4  from  L®,  and  some  quantifier  free  formula  ex  from  QFFB, 
then  either  S3  or  S4  £ L|pj  {qj.  If  S3  £ Ljpj  (qj,  then  S'  is 

{p'}  if  ex  then  S3  else  S4  fi  {q'}, 

where  S3  is  the  specification  which  is  the  image  of  S3  under  the  composed  statement 
transformation, 


Tji  S3  —*■  S3'. 

Let  W £ C'.  Since 

Th(J)  Ip'I  W 

it  follows  that  for  some  pre-  and  post-conditions  p3,  qj,  and  p4,  q4  associated  with  S3  and 
*4.  respectively, 

(i)  W is  ife1  then  W3  else  W4  fi  for  some  W3,  W4  £ L$. 

(H)Th(J)  I—  {p'J  W {q'>. 

(iii)  Th(i)  | Sl'  {pa}  W3  {q,}. 

(iv)  Th(fl  |-!‘  {p<(  W4 

Using  the  induction  hypothesis,  if  W3  £ L$  satisfies  (iii),  then 

(v)  Th(i)  |-S‘  {p3>  Ws  {q3>- 

It  follows  from  (i),  (ii),  (iv),  and  (v)  that 
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or  W G C. 


If  we  assume  that 


*4 


Th(J)  \-s  {p'|  W {q'}. 

6 L{p)  |qj,  then  the  proof  is  similar. 


d)  If  S is  a while  specification, 

(p'}  while  ex  do  S3  od  {q'}, 

for  some  specification  S3  from  L® , and  some  quantifier  free  formula  ex  from  QFFB,  then  S' 
is 


{p'}  while  ex  do  S3  od  {q#j, 

where  S3  is  the  specification  which  is  the  image  of  S3  under  the  composed  statement 
transformation, 


Tji  ► S3 . 

Let  W 6 C'.  Since 

Th(J)  \-’  {p'l  w W}, 

it  follows  that  for  pre-  and  post-conditions  p3,  q3  associated  with  $3 

(i)  W is  while  ex  do  W3  od  for  some  W3  £ L^. 

(ii)  Th(i)  1-  (p'(  W W}. 

(Hi)  Th(i)  t-5-'  {p3(  W3  {*}. 

Using  the  induction  hypothesis,  if  W3  € L$  satisfies  (iii),  then 
(iv)  Th(  2)  f-S*  {p3}  W3  {qa}. 

It  follows  from  (i),  (ii),  and  (iv)  that 

Th(J)  |-S  V}  W W). 

or  W £ C. 


Lemma:  ( Correctness  of  Conditional  Statement  Implementations  ) Let  T be  a conditional 
statement  transformation  of  the  specification  S,  which  contains  the  unknown  specification, 
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(p)  |q}> 

where  p,  q are  formulas  from  WFFB.  Let  S'  be  the  image  of  S under  T and  let  p',  q'  be  the  pre- 
and  post-conditions  associated  with  S and  S'.  Let  C be 

{ W € L&  I Th(J)  |-S  {P'}  W {q'}  } 

and  let  C'  be 

{ W G L$  I Th(J)  I—5*  jp'}  W {q'}  }. 

For  each  W G C\  W G C;  that  is,  C'  C C. 

Proof:  The  proof  is  by  induction  on  the  specification  5.  Associated  with  the  conditional  state- 
ment transformation  T are  the  quantifier  free  formula  e from  QFFB,  the  formulas  pj,  p2,  q1(  q2 
from  WFFb,  and  the  specifications,  {px}  {qj  and  {p2}  {q2},  from  L®.  Let  be  {px}  {qj  and  let 
S2  be  (p2|  {q2}. 


a)  If  S is  the  unknown  specification, 


then  S'  is 


{p}  {q}» 


{p}  i/e  then  Sx  else  S2  fi  {q}. 

If  W G C',  then 

Th(i)  h5*  {p}  W {q}. 

It  follows  that 

(i)  W is  if  e then  Wj  else  W2  fi  for  some  W1}  W2  G L$. 

(ii) Th(J)f—  {p}  W (q). 

(iii) Th(J)|-S,{p1}W1 

(iv)  Th(i)  I— ^ {p,}  W,  jq,}. 

As  a consequence  of  (i)  - (iv),  it  follows  that 
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Th(  J)  |-S  {p}  W {q> 

or  W € C. 


b)  If  S is  a composed  specification, 

{p'}  s3 ; s4 «}, 

for  some  specifications  S3,  S4  from  LSB,  then  either  S3  or  $t  G L{p}  {q}.  If  $t  G L{p}  {q},  then 
S'  is 


M S3  ; S,'  {q1}, 

where  S^  is  the  specification  which  is  the  image  of  J4  under  the  conditional  statement 
transformation, 

T4;  S4  — ♦ S4'. 

Let  W G C'.  Since 


Th(i)  I-*  {p'}W{q'}, 

it  follows  that  for  some  pre-  and  post-conditions  p3,  qj,  and  p4,  q4  associated  with  S3  and 
54,  respectively, 

(i)  W is  W3  ; W4  for  some  W3,  W4  G L$. 

(ii)  Th (I)  f-  {?'}  W |q'}. 

(iii)  Th(J)  |-5'  {p,}  W3  {q,}. 

(iv)  Th(J)  I— **  {p.)  W, 

Using  the  induction  hypothesis,  if  W4  G satisfies  (iv),  then 

(v)  Th(J)  f-!* !?,}  W4 

It  follows  from  (i),  (ii),  (iii),  and  (v)  that 

Th(J)  f-5  |p'}  W {q'|. 

or  W G C.  If  we  assume  that  $3  £ L^pj  |qj,  then  the  proof  is  similar. 
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c)  If  5 is  a conditional  specification, 


|p'}  t/e1  then  S3  else  S4  fi  {q'}, 

for  some  specifications  S3,  S4  from  L® , and  some  quantifier  free  formula  et  from  QFFB, 
then  either  S3  or  $4  G L{p)  {q}.  If  $4  G L{p}  {qj,  then  S'  is 

{p'}  *7  ei  then  S3  else  S4  fi  {q'}, 

where  S4  is  the  specification  which  is  the  image  of  S4  under  the  conditional  statement 
transformation, 


T4:  S4  - S4'. 

Let  W G C.  Since 

Th(J)  f-5'  {p'}  W {q'}, 

it  follows  that  for  some  pre-  and  post-conditions  p3,  q3,  and  p4,  q4  associated  with  S3  and 
S4,  respectively, 

(i)  W is  i f et  then  W3  else  W4  fi  for  some  W3,  W4  G L$. 

(ii) Th(i)H{p'}W{q'}. 

(iii)  Th(i)  HS‘  {Pa}  W3  {qj}. 

(iv)  TK(J)  h-5*’  {P,}  W4  f<u}. 

Using  the  induction  hypothesis,  if  W4  G satisfies  (iv),  then 

(v)  Th(  J)  1 S*  (p4)  W4  |q4}. 


It  follows  from  (i),  (ii),  (iii),  and  (v)  that 

Th(J)  | — S {p1}  W {q1}. 

or  W G C.  If  we  assume  that  S3  G L{pj  ^qj,  then  the  proof  is  similar. 


d)  If  5 is  a while  specification, 


|p'}  while  e4  do  S3  od  {q'j, 
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for  some  specification  S3  from  L®,  and  some  quantifier  free  formula  ex  from  QFFB,  then  S' 
is 

{p'}  while  ex  do  S3  od  {q'j, 

where  S3'  is  the  specification  which  is  the  image  of  S3  under  the  conditional  statement 
transformation, 

T3:  S3  —*•  S3. 

Let  W 6 C'.  Since 

Th(J)  I— 51  M W «}, 

it  follows  that  for  pre—  and  post— conditions  P3,  q^  associated  with  S3 

(i)  W is  while  ex  do  W3  od  for  some  W3  6 L$. 

(ii)  Th (I)  \-  (p'|  W {q'|. 

(iii)  Th(i)  |-5'  (p,}  W3 

Using  the  induction  hypothesis,  if  W3  G L$  satisfies  (iii),  then 

(iv)  Th (I)  \-S'  {p3}  W3  {q,}. 

It  follows  from  (i),  (ii),  and  (iv)  that 

Th(J)  I-5  {p'}  W (q'}. 

or  W G C. 

Lemma:  ( Correctness  of  While  Statement  Implementations  ) Let  T be  a while  statement 

transformation  of  the  specification  S,  which  contains  the  unknown  specification, 

{p}  {q}, 

where  p,  q are  formulas  from  WFFB.  Let  S'  be  the  image  of  S under  T and  let  p',  q'  be  the  pre- 
and  post-conditions  associated  with  S and  S'.  Let  C be 

{ W £ L$  I Th(J)  \—s  |p'}W{q'}} 

and  let  C'  be 
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! W £ L®  I Th( J)  h5'  {P'}  W {q'j  }. 

For  each  W € C',  W £ C;  that  is,  C'  C C. 

Proof:  The  proof  is  by  induction  on  the  specification  S.  Associated  with  the  specific  while  state- 
ment transformation  T are  formulas  px,  qx  from  WFFB,  a specification  {px}  {qx}  from  L®,  and  a 
quantifier  free  formula  e from  QFFB.  Let  $x  be  {px}  {qx}. 


a)  If  5 is  the  unknown  specification, 


then  S'  is 


{p}  {q}> 


{p}  while  e do  Sx  od  {q}. 

If  W £ C',  then 

Th(J)  f-5*  {p}  W {q}. 

It  follows  that 

(i)  W is  while  e do  Wx  od  {q}  for  some  Wx  £ L^. 

(ii)  Th(I)  1 — {pi  W {q}. 

(iii) Th(7)|-Sl{Pi}Wx  {qx}. 

As  a consequence  of  (i)  - (iii),  it  follows  that 

Th(J)  \-S  {p}W{q} 

or  W £ C. 


b)  If  S is  a composed  specification, 

Ip'}  $3 ; $4  {q'}> 

for  some  specifications  S3,  S4  from  L®,  then  either  S3  or  $4  £ L{pj  If  S3  £ L{p}  {qj,  then 
S'  is 

fp’>  V ; s,  w>, 
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where  S3  is  the  specification  which  is  the  image  of  S3  under  a while  statement  transforma- 


tion, 


Tj:  53' — ► S3. 

Let  W G C'.  Since 

Th(i)  {p'}  w {q*}, 

it  follows  that  for  some  pre-  and  post-conditions  p3,  <13,  and  p4,  q4  associated  with  S3  and 
*4.  respectively, 

(i)  W is  W3  ; W4  for  some  W3,  W4  6 L&. 

(ii)  Th(J)  1—  {p'j  W |q'}. 

(iii)  Th (1)  hS V {Pal  W3  {q3}. 

(iv)  Th (I)  \-S ‘ (p4)  W4  {q4}. 

Using  the  induction  hypothesis,  if  W3  G L$  satisfies  (iii),  then 

(v)  Th(J)  |-S>  {p3}  W3  {q4|. 

It  follows  from  (i),  (ii),  (iv),  and  (v)  that 

Th(J)|-*{p'}W{q'}. 

or  W G C.  If  we  assume  that  $4  G L|pj  jqj,  then  the  proof  is  similar. 


c)  If  S is  a conditional  specification, 

|p'}  *7  ei  then  S3  else  S4  fi  {q'}, 

for  some  specifications  S3,  SA  from  L®,  and  some  quantifier  free  formula  et  from  QFFB, 
then  either  S3  or  St  G L{p)  {q).  If  S3  € L{p}  {q},  then  S'  is 

{p'}  */ei  then  S3  else  S4  fi  {q'}, 

where  S3  is  the  specification  which  is  the  image  of  S3  under  a while  statement  transforma- 
tion, 
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T3:  S3  S3 . 

Let  W £ C.  Since 

Th(J)  ^ {p'}  W (q'}, 

it  follows  that  for  some  pre-  and  post-conditions  p3,  qj,  and  p4,  q4  associated  with  S3  and 
S4,  respectively, 

(i)  W is  ife4  then  W3  else  W4  fi  for  some  W3,  W4  6 L^* 

(ii)  Th(  I)  \-  {p'}  W {q'}. 

(iii)  Th(I)  \-S'  {p3}  W3  {q,}. 

(iv)  Th(J)  l—^4  {p4}  W4{q4}. 

Using  the  induction  hypothesis,  if  W3  £ satisfies  (iv),  then 

(v)  Th(i)  HS‘  {p3>  W3  {qa>- 

It  follows  from  (i),  (ii),  (iv),  and  (v)  that 

Th(fl  |-s  {p')  W {q*}. 

or  W G C.  If  we  assume  that  S4  G Ljpj  |qj,  then  the  proof  is  similar. 


d)  If  5 is  a while  specification, 

{p'}  while  ex  do  S3  od  {q;}, 

for  some  specification  S3  from  L®,  and  some  quantifier  free  formula  ex  from  QFFB,  then  S' 
is 


{p#}  while  ex  do  S3'  od  {q'}, 

where  S3'  is  the  specification  which  is  the  image  of  S3  under  a while  statement  transforma- 
tion, 


T.v  S3 


c ' 

03 . 


Let  W G C'.  Since 
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Th(J)  I-5*  |p'}  w (q'|, 

it  follows  that  for  pre-  and  post-conditions  p3,  <13  associated  with  S3 

(i)  W is  while  ex  do  W3  od  for  some  W3  6 L$» 

(ii)  Th (J)  h fp'}  W {q'}. 

(Hi)  Th(J)  I-*  ip,)  Wj  (q,). 

Using  the  induction  hypothesis,  if  W3  G L$  satisfies  (iii),  then 

(iv)  Th (I)  f-*‘  rP3>  W3  {**}■ 

It  follows  from  (i),  (ii),  and  (iv)  that 

Th(J)|-S  {p'}W{q'}. 

or  W € C. 

Theorem:  ( Transformations  on  Specifications  Containing  Unknown  Specifications  ) Let  (S,  C) 
be  an  abstract  program.  Assume  that  5 is  a specification  from  L{®}{q};  that  is,  5 contains  the 
unknown  specification, 


{P>  {q}, 

and  that  p',  q'  are  the  pre-  and  post-conditions  associated  with  S.  Let  C be  the  set 

{W6L®  ITh(J)|— 5 {p'}W{q'}  }. 

Let  T be  a transformation  from  5 to  S'  which  is  either  an  assignment  statement  transformation, 
a composed  statement  transformation,  a conditional  statement  transformation,  or  a while  state- 
ment transformation.  Let  C'  be  the  set 

(W€L®I  Th(J)  h5'  {p'}  W {q'}  }. 

For  each  W E C',  W € C;  that  is,  C'  C C. 

Proof:  The  proof  is  an  immediate  consequence  of  the  preceding  four  lemmas. 

Theorem:  ( Development  Step  — general  case  ) Let  (S,  C)  be  an  abstract  program.  Assume 
that  S is  a specification  from  L|®  j {qj;  that  is,  $ contains  the  unknown  specification, 
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IP>  (q>, 

and  C is 

{ WGL^ITh(i)|-s  {p'}W{q'}}. 

Let  T be  a transformation  from  S to  S'  which  is  either  an  assignment  statement  transformation, 
a composed  statement  transformation,  a conditional  statement  transformation,  or  a while  state- 
ment transformation.  Let  C'  be  the  set 

(WGL®  ITh(J)|-*'{p,}W{q'}}. 

Then  (S',  C;)  is  an  abstract  program  and  the  pair  of  abstract  programs, 

(( <?,  C),  (S',  C')), 

is  a development  step  with  the  property  that  C'  C C. 

Proof:  This  is  an  immediate  consequence  of  the  theorem  on  the  construction  of  a new  abstract 
program  and  the  preceding  theorem. 

Theorem:  ( Development  Step  Correctness  — general  case  ) Let  (5,  C)  be  an  abstract  pro- 

gram. Assume  that  S is  a specification  from  L^j{qj;  that  is,  S contains  the  unknown 
specification, 

{p}  {q}, 

and  C is 

{W6LglTh(J)Hs{p')W{q'}}. 

Let  T be  a transformation  from  S to  S'  which  is  either  an  assignment  statement  transformation, 
a composed  statement  transformation,  a conditional  statement  transformation,  or  a while  state- 
ment transformation.  Let  C*  be  the  set 

{W<=L&ITh(J)|— * M»W)  >. 

and  suppose  that  C'  # 0.  Then  (S',  C')  is  an  abstract  program  and  the  pair  of  abstract  pro- 
grams, 
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((5,  C),  (S',  C')), 

is  a correct  development  step. 

Proof:  This  is  an  immediate  consequence  of  the  preceding  theorem  and  the  definition  of  a 
correct  development  step. 


5.6.  Obtaining  an  Implementation  Using  Proof  Rules 
In  the  preceding  section,  given  an  abstract  program, 


and  a specification  transformation, 


we  have  a new  abstract  program, 


for  which 


(S,  C), 


T:  S -*•  S', 

(?,  O'), 


((5,  C),  (S',  C')) 

is  a development  step  with  the  property  that  C'CC.  In  order  to  use  a development  step  in  the 
development  of  a program  we  need  to  start  with  a W G C,  a transformation, 

T:  S — ► S', 

and  conditions  .which  when  satisfied,  guarantee  that  W G C.  This  is  the  main  result  of  this  sec- 
tion. The  conditions  are  the  proof  rules  which  are  given  in  section  5.4.  In  general,  given  W G C, 
W ^ C'  since  the  set  inclusion  relation  from  section  5.5  is  C'  C C.  The  main  theorem  that  we 
prove  in  this  section  requires  four  lemmas,  each  lemma  obtains  the  result  for  one  of  the  four 
kinds  of  specification  transformations. 


Lemma:  ( Assignment  Statement  Implementations  — general  case  ) Let  T be  an  assignment 
statement  transformation  of  the  specification  $,  which  contains  the  unknown  specification, 

{p}  (q(, 
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where  p,  q are  formulas  from  WFFB.  Let  S'  be  the  image  of  $ under  T and  let  p',  q'  be  the  pre- 
and  post-conditions  associated  with  S and  S'.  Associated  with  the  specific  assignment  statement 
transformation  T is  a variable  x G V,  and  a term  t from  TB.  Let  C be 

{ W G L$r  I Th(J)  |— 5 {p'}  {q'}} 

and  let  C'  be 

(WGLB  ITh(J)i-S'{p'}  {q'}| 

If  W G C and  the  assignment  statement  proof  rule  holds  for  5,  then  W G C'. 

Proof:  The  proof  is  by  induction  on  the  specification  5. 

a)  If  S is  the  unknown  specification, 

{p}  {q}, 

then  5 1 is 

{p}  * •- 1 {q}- 

if 

(i)  W is  x ;=  t 

(ii)  Th(  J)  h“  {p}  x :=  t {q}, 

then  W G C.  Condition  (i)  follows  from  the  fact  that  T is  an  assignment  statement 
transformation  from  S to  S'.  Condition  (ii)  follows  from  the  assumption  that  the  assign- 
ment statement  proof  rule  holds  for  S. 

b)  If  S is  a composed  specification, 

(p'>  Si ; s2  {q*}, 

for  some  specifications  Sv  S2  from  L®,  then  either  Sx  or  S2  € L^pj  |qj.  Assume  that  S1  G 
L^pj  The  specification  S'  is 

{?'}  T 1(S1) ; S2  {q'}, 
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where 

Ti:  $x  -+•  Si 

is  an  assignment  statement  transformation  for  which  the  assignment  statement  proof  rule 
holds  for  $v  Since  W £ C,  it  follows  that  for  some  pre-  and  post-conditions  px,  qx,  and 
p2,  q2  associated  with  5X  and  ^2>  respectively,  that 

(i)  W is  Wx  ; W2  for  some  Wx,  W2  £ L&. 

(ii)  Th(J)  I-  (p'}  W {q'}. 

(hi)  Th(J)  I-51  (Pi)  Wx  !qx}. 

(iv)  Th (J)  (-S‘  {p2}  W2  {q2}. 

From  (iii)  and  the  induction  hypothesis, 

(v)  Th  (J)  |-51'  (px)  Wx  {qx}. 

It  follows  from  (i),  (ii),  (iv),  and  (v)  that 

Th(J)  I—*  W W W}. 

Therefore,  W £ C.  If  we  assume  that  S2  £ L^pj  |qj,  then  the  proof  is  similar. 


c)  If  5 is  a conditional  specification, 

{p'j  if  e then  $x  else  S2  fi  {q1}, 

for  some  quantifier  free  formula  e from  QFFB,  and  some  specifications  5X,  S2  from  L®, 
then  either  5X  or  S2  £ L{pj  {q}.  If  5X  £ L^pj  (qj,  then  S1  is 

|p')  i/e  then  Tx($x)  else  S2  fi  {q'}, 

where 


Tx:  $i  - Si 

is  an  assignment  statement  transformation  for  which  the  assignment  statement  proof  rule 
holds  for  $x.  Since  W £ C,  it  follows  that  for  some  pre-  and  post-conditions  px,  qx,  and 
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p2,  q2  associated  with  Sj  and  S2,  respectively,  that 

(i)  W is  if  e then  Wj  else  W2  fi  for  some  Wlf  W2  G L^. 

(ii)  Th(J)  f-  (p'|  W (q'}. 

(hi)  Th(I)  kSl  {Pil  Wi  lqJ. 

(iv)  Th(J)  HS*  |P2}  W2  (q2|. 

From  (iii)  and  the  induction  hypothesis, 

(v)  Th(i)  hSl'  (Pil  Wx  {qi|. 

It  follows  from  (i),  (ii),  (iv),  and  (v)  that 

Th(I)\-S'  {p'}W  {q'}. 

Therefore,  W £ C'.  If  we  assume  that  S2  £ L^pj  then  the  proof  is  similar, 
d)  If  5 is  a while  specification, 

{p'}  while  e do  $x  od  {q'}, 

for  some  specification  Si  from  L®,  and  some  quantifier  free  formula  e from  QFFB,  then  S1 
is 

(p'}  while  e do  T x{Si)  od  {q'}, 

where 

T,:  S,  - V 

is  an  assignment  statement  transformation  for  which  the  assignment  statement  proof  rule 
holds  for  Sv  Since  W G C,  it  follows  that  for  some  pre-  and  post-conditions  pt,  q!  associ- 
ated with  Sj,  that 

(i)  W'  is  while  e do  Wj  od  for  some  G 

(ii)  Th (I)  f-  {p'}  W (q'}. 


p2>  ^2  associated  with  Sx  and  S2>  respectively,  that 
July  29.  W is  if*  then  Wi  else  W,  fi  for  some  W„  W,  € Lg.  DRAFT 

(ii)  Th(7)  h-  {p'}  W {q'l. 

(iii)  Th(/)  |-!‘  Ip,}  W, 

(iv)  Th(J)  I-5*  (p2)  W,  {q,}. 

From  (iii)  and  the  induction  hypothesis, 

(»)  Th(J)  (— S‘  (Pi)  W,  {q,}. 

It  follows  from  (i),  (ii),  (iv),  and  (v)  that 

Th(J)  H*  {p'}  W W}. 

Therefore,  W £ C'.  If  we  assume  that  S2  € L^pj  |qj,  then  the  proof  is  similar, 
d)  If  5 is  a while  specification, 

{p'}  while  e do  od  {q'}, 

for  some  specification  from  L®,  and  some  quantifier  free  formula  e from  QFFB,  then  S' 
is 

{p'}  while  e do  Tj(5x)  od  {q'}, 

where 

Tx:  * - V 

is  an  assignment  statement  transformation  for  which  the  assignment  statement  proof  rule 
holds  for  Sv  Since  W £ C,  it  follows  that  for  some  pre-  and  post-conditions  plt  qx  associ- 
ated with  Sls  that 

(i)  W'  is  while  e do  od  for  some  Wx  £ L^. 

(ii)  Th(J)  I-  (p'(  W {q'l. 

(iii)  Th(i)  hS‘  (P,>  W,  (q,). 
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From  (iii)  and  the  induction  hypothesis, 

(iv)  Th(J)  (— Sl*  {pi)  {qj. 

It  follows  from  (i),  (ii),  and  (iv)  that 

Th(i)  I-5'  {P'l  W {q'j 

or  W € C'. 

Lemma:  ( Composed  Statement  Implementations  — general  case  ) Let  T be  a composed  state- 
ment transformation  of  the  specification  S,  which  contains  the  unknown  specification, 

Ip)  {q}» 

where  p,  q are  formulas  from  WFFB.  Let  S'  be  the  image  of  S under  T and  let  p',  q'  be  the  pre- 
and  post-conditions  associated  with  S and  S'.  Associated  with  the  specific  composed  statement 
transformation,  T,  are  formulas  plf  p2,  qlf  q2  from  WFFB,  and  the  specifications,  {px}  {qx}  and 
{p2}  {q2},  from  L®.  Let  C be 

{WGL*  ITh(J)|-*  {p'}W{q'}} 

and  let  C'  be 

{ W e L^  I Th(J)  h-5*  {p'}W{q'}  }. 

If  W £ C and  the  composed  statement  proof  rules  hold  for  $,  then  W 6 C'. 

Proof:  The  proof  is  by  induction  on  the  specification  S. 

a)  If  S is  the  unknown  specification, 

Ip}  {q}» 

then  S'  is 

{p}  Si  5 S2  {q}, 

where  St  is  {pj  (qj  and  S2  is  {p2}  {q2}.  If 
(i)  W is  ; W2  for  some  W1(  W2  6 L$ 
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(ii)  Th(J)  h-  |p}W{q> 

(iii)  Th (j)  (-5l  (Pil  Wj  {*} 

(iv)  Th(J)  h-S*  Ip,}  W2  {q2}, 

then  W € C\  Condition  (i)  follows  from  the  fact  that  T is  a composed  statement 
transformation  from  S to  S'.  Conditions  (ii)  - (iv)  are  consequences  of  the  composed 
statement  proof  rules. 

b)  If  S is  a composed  specification, 

W s* ; s.  W), 

for  some  specifications  S3,  S,  from  LSB,  then  either  $3  or  S4  6 Lw  |q).  Assume  that  S3  g 
L{pj  jqj.  The  specification  S'  is 

«P'}  Ta(5s)  i S.  {q*}. 

where 

T3:  S3  - S3' 

is  a composed  statement  transformation  for  which  the  composed  statement  proof  rules 
hold  for  S3.  Since  W £ C,  it  follows  that  for  some  pre-  and  post-conditions  p3,  qg  and  p4, 
q4  associated  with  S3  and  S4,  respectively,  that 

(i)  W is  W3  ; W4  for  some  W3,  W4  E L$. 

(ii)  Th(i)  |—  {p'|  W {q'|. 

(Hi)  Th(J)  t-S‘  {p,}  W3  {*). 

(iv)  Th(i)  |-s‘  (p.)  W,  (ru). 

Using  the  induction  hypothesis,  it  follows  from  (iii)  that 

(v)  Th(J)  (-*’  {ps)  W3 

It  follows  from  conditions  (i),  (ii),  (iv),  and  (v)  that 
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Th(J)  | — ^ {p'|  W jq'}. 

Therefore,  W G C'.  If  S4  6 L{p)  {qj,  then  the  proof  is  similar. 

c)  If  S is  a conditional  specification, 

{p'|  if  ex  then  S3  else  S4  fi  {q'|, 

for  some  quantifier  free  formula  e4  from  QFFB,  and  for  some  specifications  $3,  S4  from 
L®,  then  either  S3  or  $4  G L{p}  {q}.  Assume  that  $3  G L{p}  {q}.  The  specification  S'  is 

{p'}  »/  ei  tlien  T3(53)  else  $4  fi  fa'li 

where 

T3:  S3  — ► S$ 

is  a composed  statement  transformation  for  which  the  composed  statement  proof  rules 
hold  for  S3.  Since  W 6 C,  it  follows  that  for  some  pre-  and  post-conditions  p3,  ^ and  p4, 
q4  associated  with  S3  and  S4,  respectively,  that 

(i)  W is  ifex  then  W3  else  W4  fi  for  some  W3,  W4  € L$. 

(ii)  Th (I)  H-  {p#}  W {q'|. 

(iii)  Th  (J)  h-5’  {P3I  W3  {qa}. 

(iv)  Th  (J)  (-S‘  {P4>  W4  {<u}- 

Using  the  induction  hypothesis,  it  follows  from  (iii)  that 

(v)  Th(i)|-S,,{P3}W3{q3}. 

It  follows  from  conditions  (i),  (ii),  (iv),  and  (v)  that 

Th  (J)  \-*  {P'}  W {q»}. 

Therefore,  W € C'.  If  S4  £ L^pj  |qj,  then  the  proof  is  similar, 
d)  If  5 is  a while  specification, 

{p'j  while  e4  do  $3  od  |q'}, 
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and  let  C'  be 


i W € L$  I Th(J)  \~*  {p'}W{q'}  }. 

If  W G C and  the  conditional  statement  proof  rules  hold  for  5,  then  W € C'. 

Proof:  The  proof  is  by  induction  on  the  specification  5. 

a)  If  S is  the  unknown  specification, 

{p}  {q}, 

then  S'  is 

{p}  if  e then  Sx  else  S2fi  {q}, 
where  is  {px}  {qj  and  S2  is  {p2}  {q2}.  If 

(i)  W is  if  e then  Wj  else  W2  fi  for  some  Vfv  W2  £ L$ 

(ii)  Th(J)  I-  {p}  W {q} 

(Hi)  Th(J)  f-S‘  (Pi)  W,  {,,) 

(i»)  Th(J)  W W,  (q2), 

then  W £ C*.  Condition  (i)  follows  from  the  fact  that  T is  a conditional  statement 
transformation  from  S to  S1 . Conditions  (ii)  — (iv)  are  consequences  of  the  conditional 
statement  proof  rules. 


b)  If  S is  a composed  specification, 

{P'}  S3  * $4  «>, 

for  some  specifications  $3,  S4  from  L®,  then  either  S3  or  S4  € L{p}  Assume  that  S3  6 
L{p}  {q}-  The  specification  S'  is 

{p'}  t3(53)  ; $<  W, 

where 

T3:  S3  —*  S3 
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is  a conditional  statement  transformation  for  which  the  conditional  statement  proof  rules 
hold  for  $3.  Since  W £ C,  it  follows  that  for  some  pre-  and  post-conditions  p3,  and  p4, 
q4  associated  with  53  and  $4,  respectively,  that 

(i)  W is  W3  ; W4  for  some  W3>  W4  G L$. 

(ii)  Th(J)  1-  {p'}  W {q'}. 

(iii)  Th (J)  h-S‘  {P3}  W3  {qal- 

(iv)  Th  (J)  {P4}  W4  {<U}- 

Using  the  induction  hypothesis,  it  follows  from  (iii)  that 

(v)  Th  (I)  hS’  {P3}  W3  {q,}. 

It  follows  from  conditions  (i),  (ii),  (iv),  and  (v)  that 

Th(i)|— * (P'l  W |q'}. 

Therefore,  W G C\  If  S4  G L{pj  {qj,  then  the  proof  is  similar. 

c)  If  5 is  a conditional  specification, 

{p'}  */  ei  Men  $ 3 §4  fi  {q*}, 

for  some  quantifier  free  formula  e4  from  QFFB,  and  for  some  specifications  $3,  S4  from 
L®,  then  either  S3  or  S4  G L^pj  |qj.  Assume  that  $3  G L|pj  ^qj.  The  specification  S'  is 

|p'}  *f*i  then  T3(53)  else  S4  fi  {q'j, 

where 

T3:  S3  - S3' 

is  a conditional  statement  transformation  for  which  the  conditional  statement  proof  rules 
hold  for  S3.  Since  W G C,  it  follows  that  for  some  pre-  and  post-conditions  p3,  qj  and  p4, 
q4  associated  with  $3  and  $4,  respectively,  that 

(i)  W is  if  e4  then  W3  else  W4  fi  for  some  W3,  W4  G L$. 
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(ii)  Th(7)  I-  |p'}  W (q'(. 

(iii)  Th(i)  I-5'  W W,  (qj). 

(iv)  Th(J)  I— s*  {p*}  W,  {q.}- 

Using  the  induction  hypothesis,  it  follows  from  (iii)  that 

(v)  Th (I)  (Pal  W3  (q,). 

It  follows  from  conditions  (i),  (ii),  (iv),  and  (v)  that 

Th(i)  i-*  Ip'}  w «}. 

Therefore,  W £ C.  If  $4  £ L^pj  |qj,  then  the  proof  is  similar, 
d)  If  $ is  a while  specification, 

{p'J  while  ex  do  S3  od  {q'}, 

for  some  specification  $3  from  Ljpj  jqj,  and  some  quantifier  free  formula  ex  from  QFFB, 
then  S'  is 

{p'l  while  ex  do  T 3($3)  od  {q1}, 

where 

T3:  S3  — ► S3 

is  a conditional  statement  transformation  for  which  the  conditional  statement  proof  rules 
hold  for  S3.  Since  W £ C,  it  follows  that  for  some  pre-  and  post-conditions  p3,  qj  associ- 
ated with  S3  that 

(i)  W is  while  ex  do  W3  od  for  some  W3  £ L$. 

(ii)  Th (I)  |-  {p'}  W {q'}. 

(iii)  Th(i)  (-*'  {p,}  W,  (q,). 

Using  the  induction  hypothesis,  it  follows  from  (iii)  that 

(iv)  Th(i)  I—5*'  {p3!  W3{q3). 
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for  some  specification  S3  from  L{pj  |qj,  and  some  quantifier  free  formula  et  from  QFFB, 
then  S'  is 

{p'|  while  ex  do  T3($3)  od  }q'}, 

where 

T3:  S3  — S3' 

is  a composed  statement  transformation  for  which  the  composed  statement  proof  rules 
hold  for  S3.  Since  W E C,  it  follows  that  for  some  pre-  and  post-conditions  p3,  associ- 
ated with  S3  that 

(i)  W is  while  ex  do  W3  od  for  some  W3  E L^. 

(ii)  Th (j)  i-  ip'}  w {q'J. 

(iii)  Th  (J)  h-5’  {p3}  W3  {q,}. 

Using  the  induction  hypothesis,  it  follows  from  (iii)  that 

(iv)  Th  (J)  I-5*'  {p3}  W3  {q3|. 

It  follows  from  conditions  (i),  (ii),  and  (iv)  that 

Th  (J)  h5*  (P'}  W {q'}. 

Therefore,  W 6 C'. 

Lemma:  ( Conditional  Statement  Implementations  — general  case  ) Let  T be  a conditional 

statement  transformation  of  the  specification  S,  which  contains  the  unknown  specification, 

{p>  {q}» 

where  p,  q are  formulas  from  WFFB.  Let  S'  be  the  image  of  S under  T and  let  p',  q'  be  the  pre- 
and  post-conditions  associated  with  S and  S'.  Associated  with  the  specific  composed  statement 
transformation,  T,  are  formulas  p3,  p2,  q3,  cfc  from  WFFB,  the  specifications,  {pj}  {q^}  and  {p2} 
{q2}  from  L®,  and  the  quantifier  free  formula  e from  QFFB.  Let  C be 

{ W E L&  I Th (I)  \-s  {p'}  W {q'}  } 
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It  follows  from  conditions  (i),  (ii),  and  (iv)  that 

Th(J)  H*  fp'}  w {q-}. 

Therefore,  W € C'. 

Lemma:  ( While  Statement  Implementations  — general  case  ) Let  T be  a while  statement 
transformation  of  the  specification  S,  which  contains  the  unknown  specification, 

{p}  {q}, 

where  p,  q are  formulas  from  WFFB.  Let  S'  be  the  image  of  S under  T and  let  p',  q'  be  the  pre- 
and  post— conditions  associated  with  S and  S' . Associated  with  the  specific  while  statement 
transformation,  T,  are  formulas  p1(  p2  from  WFFB,  the  specification,  {pj}  {qj},  from  L®,  and  the 
quantifier  free  formula  e from  QFFB.  Let  C be 

{ WeL^ITh (I)\-s  {p'}W{q'}} 

and  let  C 1 be 

{ w etg  I ThWh'Hw  «}}. 

If  W 6 C and  the  while  statement  proof  rules  hold  for  S,  then  W ec'. 

Proof:  The  proof  is  by  induction  on  the  specification  S. 

a)  If  S is  the  unknown  specification, 

{p|  (q}? 

then  S' is 

{p}  while  e do  Sx  od  {q}, 

where  Si  is  {Pil  {qi}.  If 

(i)  W is  while  e do  W2  od  for  some  Wj  6 L$ 

(ii)  Th (2)  1-  (p)  W {q} 

(iu)  Th(fl  |-S‘  {p,}  w,  {q,}. 
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then  W E C'.  Condition  (i)  follows  from  the  fact  that  T is  a while  statement  transforma- 
tion from  S to  S'.  Conditions  (ii)  and  (iii)  are  consequences  of  the  while  statement  proof 
rules. 

b)  If  S is  a composed  specification, 

{p*>  S, ; S, «}, 

for  some  specifications  S3,  S4  from  L®,  then  either  S3  or  S4  € L^pj  |qj.  Assume  that  S3  € 
L{p}  {<*}•  The  specification  & is 

{p'}  T3(S3) ; S4  {q'}, 

where 

T3:  S3  —*■  S3 

is  a while  statement  transformation  for  which  the  while  statement  proof  rules  hold  for  S3. 
Since  W E C,  it  follows  that  for  some  pre-  and  post-conditions  p3,  qj  and  p4,  q4  associ- 
ated with  S3  and  S4,  respectively,  that 

(i)  W is  W3  ; W4  for  some  W3,  W4  E L$- 

(ii)  Th(J)  |-  (p'}  W {q'}. 

(iii)  Th (J)  |-S‘  fp3l  W3  {qal- 

(iv)  Th(J)  (P4I  W4  (q4). 

Using  the  induction  hypothesis,  it  follows  from  (iii)  that 

(v)  Th(J)  {p3}  Ws  {qj}. 

It  follows  from  conditions  (i),  (ii),  (iv),  and  (v)  that 

Th(J)  I-*  M W Jq-J. 

Therefore,  W E C'.  If  S4  E L|pj  |qj,  then  the  proof  is  similar, 
c)  If  S is  a conditional  specification, 
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{p'}  if  ex  then  S3  else  $4fi{ q'}, 

for  some  quantifier  free  formula  ej  from  QFFB,  and  for  some  specifications  from 

L®,  then  either  S3  or  S4  G L{p}  |qj.  Assume  that  $3  € Lqpj  {qj.  The  specification  S'  is 

{p'i  */ex  then  T3(53)  else  S4  fi  {q'}, 

where 

T3:  S3  - $3 

is  a while  statement  transformation  for  which  the  while  statement  proof  rules  hold  for  S3. 
Since  W G C,  it  follows  that  for  some  pre-  and  post-conditions  p3,  q3  and  p4,  q4  associ- 
ated with  S3  and  $4,  respectively,  that 

(i)  W is  */ei  then  W3  else  W4  fi  for  some  W3,  W4  £ L$. 

(ii)  Th(i)  I-  |p'}  W |q'}. 

(iii)  Th(j)  Hs‘  !p3!  w,  {<,3). 

(iv)  Th(J)  f-s*  {P)}  W4  {(U). 

Using  the  induction  hypothesis,  it  follows  from  (iii)  that 

(v)  Th(i)  (-*  |p3}  W3  {q,}. 

It  follows  from  conditions  (i),  (ii),  (iv),  and  (v)  that 

Th(J)  {p'}  W {q'}. 

Therefore,  W G C'.  If  S4  G Lqpj  then  the  proof  is  similar, 
d)  If  S is  a while  specification, 

{p'}  while  ex  do  S3  od  |q'}, 

for  some  specification  S3  from  L|pj  |qj,  and  some  quantifier  free  formula  e3  from  QFFB, 
then  S'  is 

{p'}  while  ej  do  T3(53)  od  {q'}, 
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where 


T3:  53  — ► $3* 

is  a while  statement  transformation  for  which  the  while  statement  proof  rules  hold  for  S3. 
Since  W £ C,  it  follows  that  for  some  pre-  and  post-conditions  p3,  associated  with  S3 
that 

(i)  W is  while  ei  do  W3  od  for  some  W3  £ L$. 

(ii)  Th(i)  1-  jp'}  W {q'}. 

(Hi)  Th(J)H5,{Pl}W3(q3}. 

Using  the  induction  hypothesis,  it  follows  from  (iii)  that 
(iv)  Th (I)  Y-Si  {p3}  W3  {qg}. 

It  follows  from  conditions  (i),  (ii),  and  (iv)  that 

Th(J)  H5'  {P'}  W {q»}. 

Therefore,  W € C'. 

Theorem:  ( Construction  of  a New  Abstract  Program  ) Let  (5,  C)  be  an  abstract  program. 
Assume  that  S is  a specification  from  |qj;  that  is,  S contains  the  unknown  specification, 

{p}  (q}- 

Let  T be  a transformation  from  S to  S'  which  is  either  an  assignment  statement  transformation, 
a composed  statement  transformation,  a conditional  statement  transformation,  or  a while  state- 
ment transformation.  Let  W£Cbe  such  that  the  proof  rules  corresponding  for  S corresponding 
to  the  transformation  T hold.  Let  C'  be  the  set  of  implementations  associated  with  S'.  Then  W 

ec'. 

Proof:  There  are  four  cases.  Either  T is  an  assignment  statement  transformation,  a composed 
statement  transformation,  a conditional  statement  transformation,  or  a while  statement 
transformation.  In  each  case,  it  follows  from  one  of  the  four  preceding  lemmas  that  W £ C\ 
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0.  Conclusions 


Need  work  here,  especially  with  the  implications  of  the  proof  rules. 
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Abstract 

The  Vienna  Development  Method  (VDM)  supports  the  top-down  development  of  software 
specified  in  a notation  suitable  for  formal  verification.  Components  are  first  written  using  a com- 
bination of  conventional  programming  languages  and  predicate  logic.  These  abstract  components 
are  then  incrementally  refined  into  components  in  an  implementation  language.  Each  refinement 
is  verified  before  another  is  applied;  therefore,  the  final  components  produced  by  the  development 
satisfy  the  original  specifications.  VDM  has  been  used  in  industrial  applications  to  enhance  the 
development  process.  In  such  environments  VDM  is  applied  in  an  informal,  non-automated 
manner;  verification  conditions  are  generated  and  certified  without  the  aid  of  specialized  tools, 
and  data  types  are  not  formally  axiomatized.  We  propose  that  an  automated  environment  sup- 
porting a formal  development  method  similar  to  VDM  can  be  constructed,  and  that  the  environ- 
ment will  enhance  the  development  method.  For  the  thesis,  we  will  design  and  build  a prototype 
environment,  and  demonstrate  that  it  enhances  the  VDM  style  development  process.  The  en- 
vironment will  support  the  use  of  executable  specifications  and  mechanical  theorem  proving,  as 
well  as  providing  simple  facilities  for  configuration  control  and  project  management. 

1.  Introduction 

It  is  widely  acknowledged  that  producing  correct  software  is  both  difficult  and  expensive.  To  help 
remedy  this  situation,  many  methods  for  specifying  and  verifying  software  have  been  developed[l0,17j. 
The  SAGA  (Software  Automation,  Generation  and  Administration)  project  is  investigating  both  the  for- 
mal and  practical  aspects  of  providing  automated  support  for  the  full  range  of  software  engineering  activi- 
ties^^]. ENCOMPASS[22,23]  is  an  integrated  environment  to  support  the  construction  of  software  in  a 
manner  similar  to  the  Vienna  Development  Method[l3],  PLEASE  is  the  wide-spectrum,  executable 
specification  and  design  language  used  in  ENCOMPASS [24,25].  For  the  thesis,  we  will  design  and  imple- 
ment prototype  versions  of  PLEASE  and  ENCOMPASS  and  demonstrate  that  they  enhance  the  software 
development  process. 
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The  first  step  in  the  production  of  a software  system  is  usually  the  creation  of  a specification  which 
describes  the  functions  and  properties  of  the  desired  system.  We  say  that  a specification  is  validated  when 
it  is  shown  to  correctly  reflect  the  users’  desires[8j.  Producing  a valid  specification  is  a difficult  task.  The 
users  of  the  system  may  not  really  know  what  they  want,  and  they  may  be  unable  to  communicate  their 
desires  to  the  development  team.  If  the  specification  is  in  a formal  notation  it  may  be  an  ineffective 
medium  for  communication  with  the  customers,  but  natural  language  specifications  are  notoriously  ambi- 
guous and  incomplete.  Prototypingl  11,16]  and  the  use  of  executable  specification  languages[14,27]  have 
been  suggested  as  partial  solutions  to  these  problems.  Providing  the  customers  with  prototypes  for  experi- 
mentation and  evaluation  early  in  the  development  process  may  increase  customer/developer  communica- 
tion  and  enhance  the  validation  and  design  processes. 

Even  with  a validated  specification,  producing  a correct  implementation  is  not  an  easy  task.  We  say 
that  an  implementation  is  verified  when  it  is  shown  to  satisfy  the  specification^].  Many  methodologies  for 
the  design  and  development  of  correct  implementations  have  been  proposed[l,2,13,19].  For  example,  it  has 
been  suggested  that  top-down  development  can  help  control  the  complexity  of  program  construction.  By 
using  stepwise  refinement\2$]  to  create  a concrete  implementation  from  an  abstract  specification  we  divide 
the  decisions  necessary  into  smaller,  more  comprehensible  groups. 

The  Vienna  Development  Method  (VDM)  supports  the  top-down  development  of  programs  specified 
in  a notation  suitable  for  mathematical  verification [13,21].  In  this  method,  programs  are  first  written  in  a 
language  combining  elements  from  conventional  programming  languages  and  mathematics.  A procedure 
or  function  may  be  specified  using  pre-  and  post-conditions  written  in  predicate  logic;  similarly,  a data 
type  may  have  an  invariant.  These  abstract  programs  are  then  incrementally  refined  into  programs  in  an 
implementation  language.  The  refinements  are  performed  one  at  a time,  and  each  is  verified  before 
another  is  applied;  therefore,  the  final  program  produced  by  the  development  satisfies  the  original 
specification. 

ENCOMPASS  is  an  environment  being  created  by  the  SAGA  project  to  provide  automated  support 
for  all  aspects  of  a development  method  similar  to  VDM.  We  believe  that  neither  testing[9,18],  technical 
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review[7],-  or  formal  verification [17]  alone  can  guarantee  program  correctness;  therefore,  ENCOMPASS 
provides  a framework  in  which  all  three  methods  can  be  used  as  needed.  ENCOMPASS  includes  a number 
of  different  tools  including:  a language-oriented  editor;  a test  harness;  a configuration  control  and  project 
management  system;  and  a user  interface  package.  ENCOMPASS  is  in  the  early  stages  of  development; 
an  initial  prototype  has  been  constructed  and  used  to  develop  small  programs.  ENCOMP ASS  is  described 
in  more  detail  in  [23],  which  is  also  Appendix  A of  this  paper;  early  reports  on  the  environment  can  be 
found  in[3,15,22]. 

PLEASE  is  the  wide-spectrum,  executable  specification  language  used  in  ENCOMPASS.  PLEASE 
extends  its  underlying  implementation,  or  base,  language  so  that  a procedure  or  function  may  be  specified 
with  pre-  and  post-conditions  and  an  implementation  may  be  completely  annotated.  At  present,  all  our 
efforts  involve  Ada1  as  the  base  language.  PLEASE  specifications  may  be  used  in  proofs  of  correctness; 
they  also  may  be  transformed  into  prototypes  which  use  Prolog[6j  to  “execute”  pre-  and  post-conditions 
and  may  interact  with  other  modules  written  in  the  base  language.  We  believe  that  the  early  production 
of  executable  prototypes  for  experimentation  and  evaluation  will  enhance  the  software  development  pro- 
cess. PLEASE  is  described  in  more  detail  in  [24],  which  is  also  Appendix  B to  this  paper;  a preliminary 
report  on  the  language  can  be  found  in[25]. 

IDEAL  is  the  programming-in— the— small  environment  used  within  ENCOMPASS [23].  IDEAL  sup- 
ports the  specification,  construction,  validation,  and  verification  of  single  modules.  It  includes  ISLET,  a 
simple  language-oriented  editor  which  supports  the  creation  of  PLEASE  specifications  and  their  refinement 
into  Ada  implementations.  As  the  specifications  are  created  and  refined,  the  syntax  and  semantics  are  con- 
stantly checked.  From  IDEAL,  the  user  can  invoke  commands  to  create  Ada/Prolog  prototypes  from 
PLEASE  specifications.  IDEAL  also  includes  an  interface  to  the  ENCOMPASS  test  harness  and  TED,  a 
proof  management  system  which  is  interfaced  to  a number  of  theorem  provers[l2]. 

In  section  two  of  this  paper,  we  describe  the  development  methodology  which  PLEASE,  IDEAL,  and 
ENCOMPASS  are  designed  to  support  and  in  section  three,  we  present  a proposed  thesis  outline.  In  sec- 

lAda  is  a trademark  of  the  US  Government,  Ada  Joint  Program  Office. 
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tion  four,  we  give  completion  criteria  for  the  thesis  in  section  five,  we  summarize  the  proposed  research 
and  expected  results. 

2.  Software  Development  in  ENCOMPASS 

ENCOMPASS  is  based  on  a traditional  or  phased[$]  life-cycle  model  extended  to  support  executable 
specifications  and  formal  verification.  In  ENCOMPASS,  a development  passes  through  the  phases:  plan- 
ning, requirements  definition,  validation,  refinement  and  system  integration.  In  the  requirements 
definition  phase , the  functions  and  properties  of  the  software  to  be  produced  by  the  development  are  deter- 
mined^]. In  ENCOMPASS,  software  requirements  specifications  are  a combination  of  natural  language 
and  components  specified  in  PLEASE.  Although  a software  system  may  be  shown  to  meet  its  specification, 
this  does  not  imply  that  the  system  satisfies  the  customers’  requirements.  In  ENCOMPASS,  we  extend  the 
traditional  life-cycle  to  include  a separate  phase  for  customer  validation. 

The  validation  phase  attempts  to  show  that  any  system  which  satisfies  the  specification  will  also 
satisfy  the  customers’  requirements,  that  is,  that  the  requirements  specification  is  valid.  If  not,  then  the 
requirements  specification  should  be  corrected  before  the  development  proceeds  any  further.  To  aid  in  the 
validation  process,  the  PLEASE  components  in  the  specification  may  be  transformed  into  executable  pro- 
totypes which  satisfy  the  specifications.  These  prototypes  may  be  used  in  interactions  with  the  customers; 
they  may  be  subjected  to  a series  of  tests,  be  delivered  to  the  customers  for  experimentation  and  evalua- 
tion, or  be  installed  for  production  use  on  a trial  basis.  The  use  of  prototypes  may  increase 
customer/ developer  communication  and  enhance  the  validation  process.  If  it  is  found  that  the 
specification  does  not  satisfy  the  customers,  then  it  is  revised,  new  prototypes  are  produced,  and  the  vali- 
dation process  is  reinitiated;  this  cycle  is  repeated  until  a validated  specification  is  produced. 

In  general,  this  process  does  not  guarantee  that  the  specification  is  valid.  The  fact  that  the  proto- 
type does  satisfy  the  customers  means  only  that  at  least  one  implementation  which  satisfies  the 
specification  is  acceptable.  For  example,  the  post-condition  for  a procedure  may  hold  true  for  an  infinite 
number  of  values  while  the  prototype  will  only  return  one.  We  say  the  specification  of  a component  is 
complete  if,  for  any  input  state,  it  is  satisfied  by  only  one  output  state.  Although  in  some  cases  it  is 
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possible  to  require  and  verify  that  the  specification  of  a component  is  complete,  this  is  difficult  in  practice. 
We  believe  that  while  prototypes  may  enhance  the  validation  process,  they  do  not  replace  communication 
with  the  customers  and  review  of  the  specification. 

In  the  refinement  phase , the  validated  specification  is  incrementally  transformed  into  a program  in 
the  implementation  language;  this  process  is  viewed  as  the  construction  of  a proof  in  the  Hoare  calculus. 
In  ENCOMPASS,  the  refinement  process  is  supported  by  a language  oriented  editor  similar  to[20j.  As  the 
specification  is  transformed  into  an  implementation  (and  the  proof  is  constructed)  the  syntax  and  seman- 
tics are  constantly  checked.  Many  steps  in  the  refinement  will  generate  verification  conditions  in  the 
underlying  first-order  logic.  These  are  algebraically  simplified  "and  then  subjected  to  a number  of  simple 
proof  tactics.  If  these  fail,  the  verification  conditions  are  passed  to  TED,  a proof  management  system 
which  is  interfaced  to  a number  of  theorem  provers[l2].  In  our  experience,  it  is  too  expensive  to  mechani- 
cally certify  all  of  the  verification  conditions;  therefore,  the  implementor  can  simply  “check  ofT  the 
verification  conditions  for  a refinement  and  continue.  The  verification  conditions  are  recorded  by 
ENCOMPASS  for  use  in  project  monitoring,  management  and  debugging. 

PLEASE  specifications  enhance  the  verification  of  system  components  using  either  testing  or  proof 
techniques.  The  specification  of  a component  can  be  transformed  into  a prototype.  This  prototype  may  be 
used  as  a test  oracle  against  which  the  implementation  can  be  compared.  Since  the  specification  is  formal, 
proof  techniques  may  be  used  which  range  from  a very  detailed,  completely  formal  proof  using  mechanical 
theorem  proving  to  a development  “annotated”  with  unproven  verification  conditions.  PLEASE  provides 
a framework  for  the  rxgorous[l3\  development  of  programs.  Although  detailed  mechanical  proofs  are  not 
required  at  every  step,  the  framework  is  present  so  that  they  can  be  constructed  if  necessary.  Parts  of  a 
project  may  use  detailed  mechanical  verification  while  other,  less  critical  parts  may  be  handled  using  less 
expensive  techniques. 

3.  Proposed  Thesis  Outline 

Figure  1 shows  the  proposed  thesis  outline.  After  the  introductory  comments,  enough  information 
on  first-order  predicate  logic  and  the  resolution  principle  is  given  to  make  the  thesis  self  contained.  In  the 


5 


September  16,  1986 


DRAFT 


1.  Introduction 

2.  Mathematical  Preliminaries 

a.  First-order  Predicate  Logic 

b.  Decidable,  Axiomatizable  Theories 

c.  Automatic  Theorem  Proving  (The  Resolution  Principle) 

3.  Previous  Work 

a.  Specification  Methods  (Algebraic,  State  Transition) 

b.  Program  Verification  (Hoare  Calculus,  Partial  and  Total  Correctness) 

c.  Logic  Programming  (Prolog) 

d.  Development  Methods  (Top-down,  Transformational,  Proofs  as  Programs) 

e.  Life-cycle  Models  (traditional,  operational,  automatic  programming) 

f.  Software  Engineering  Environments 

4.  PLEASE  (Statements,  Pre-defined  and  User  Types) 

5.  Producing  Prototypes  from  Pre-  and  Post-conditions 

6.  Using  PLEASE  Prototypes  in  Software  Validation 

7.  Refinement  of  IDEAL  Specifications  (Incremental  Verification) 

8.  IDEAL  (Goals,  Development  Paradigm,  Components) 

9.  ENCOMPASS  (Goals,  Life-Cycle,  Limitations,  Components) 

10.  Implementation 

11.  Summary  and  Conclusions 

12.  References 


Figure  1.  Proposed  Thesis  Outline 


previous  work  section,  results  on  program  specification  and  verification,  logic  programming,  development 
methods,  life-cycle  models  and  software  engineering  environments  are  given.  In  section  four,  both  the 
abstract  syntax  and  semantics  of  the  PLEASE  language  are  defined.  In  section  five,  the  methods  used  to 
produce  prototypes  from  PLEASE  specifications  are  discussed,  while  in  section  six  the  use  of  these  proto- 
types in  software  validation  is  explored.  Section  seven  discusses  the  incremental  refinement  of  PLEASE 
specifications  into  Ada  implementations,  while  sections  eight  and  nine  discuss  IDEAL  and  ENCOMPASS 
respectively.  Section  ten  briefly  describes  the  implementation  and  section  eleven  contains  a summary  and 
conclusions. 

4.  Completion  Criteria 

In  the  completed  thesis,  a prototype  implementation  of  ENCOMPASS  with  the  following  features 
will  be  described: 
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• Rudimentary  systems  for  object-oriented  configuration  control  and  project  management. 

• Tools  to  automatically  translate  PLEASE  specifications  into  Ada/Prolog  prototypes. 

• A test  harness  compatible  with  both  Ada  implementations  and  PLEASE  prototypes. 

• A language-oriented  editor  to  support  the  creation  and  refinement  of  PLEASE  specifications. 

This  prototype  will  support  a preliminary  subset  of  PLEASE  with  the  following  features: 

• A small,  fixed  set  of  types  including  natural  numbers,  lists,  booleans  and  characters. 

• The  if-then-else,  while  and  assignment  statements. 

• Procedure  calls  with  in,  out  and  in  out  parameters. 

• User  defined  functions  (without  side  effects). 

• A facility  supporting  user-defined  types  specified  using  predicate  logic. 

Throughout  the  thesis,  the  emphasis  will  be  placed  on  the  theoretical  basis  and  design  of  these  com- 
ponents, rather  than  on  the  creation  of  production-quality  implementations.  The  emphasis  will  also  be  on 
the  programming-in-the-small  aspects  of  the  environment,  rather  than  on  the  programming-in-the-large; 
only  an  architecture  for  ENCOMPASS  will  be  given,  while  IDEAL  and  PLEASE  will  be  explained  in 
greater  detail. 

5«  Summary 

The  Vienna  Development  Method  (VDM)  supports  the  top-down  development  of  software  specified 
in  a notation  suitable  for  formal  verification.  Components  are  first  written  using  a combination  of  conven- 
tional programming  languages  and  predicate  logic.  These  abstract  components  are  then  incrementally 
refined  into  components  in  an  implementation  language.  Each  refinement  is  verified  before  another  is 
applied;  therefore,  the  final  components  produced  by  the  development  satisfy  the  original  specifications. 
VDM  has  been  used  in  industrial  applications  to  enhance  the  development  process.  In  such  environments 
VDM  is  applied  in  an  informal,  non-automated  manner;  verification  conditions  are  generated  and  certified 
without  the  aid  of  specialized  tools,  and  data  types  are  not  formally  axiomatized.  We  propose  that  an 
automated  environment  supporting  a formal  development  method  similar  to  VDM  can  be  constructed,  and 
that  the  environment  will  enhance  the  development  method.  For  the  thesis,  we  will  design  and  build  a 
prototype  environment,  and  demonstrate  that  it  enhances  the  VDM  style  development  process.  The 
environment  will  support  the  use  of  executable  specifications  and  mechanical  theorem  proving,  as  well  as 
providing  simple  facilities  for  configuration  control  and  project  management. 
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Abstract 

ENCOMPASS  is  an  integrated  environment  being  constructed  by  the  SAGA  project  to  support 
incremental  software  development  in  a manner  similar  to  the  Vienna  Development  Method.  In 
this  paper,  we  describe  the  architecture  of  ENCOMPASS  and  give  an  example  of  software 
development  in  the  environment.  In  ENCOMPASS,  software  is  modeled  as  entities  which  may 
have  relationships  between  them.  These  entities  can  be  structured  into  complex  hierarchies  which 
may  be  seen  through  different  views.  The  configuration  management  system  stores  and  structures 
the  components  developed  and  used  in  a project,  as  well  as  providing  a mechanism  for  controlling 
access.  The  project  management  system  implements  a milestone-based  policy  using  the  mechan- 
ism provided.  In  ENCOMPASS,  software  is  first  specified  using  a combination  of  natural 
language  and  PLEASE,  a wide-spectrum,  executable  specification  and  design  language.  Com- 
ponents specified  in  PLEASE  are  then  incrementally  refined  into  components  written  in  Ada1; 
this  process  can  be  viewed  as  the  construction  of  a proof  in  the  Hoare  calculus.  Each  refinement 
is  verified  before  another  is  applied;  therefore,  the  final  components  produced  by  the  development 
satisfy  the  original  specifications.  PLEASE  specifications  may  be  used  in  formal  proofs  of 
correctness;  they  may  also  be  transformed  into  executable  prototypes  which  can  be  used  in  the 
validation  and  design  processes.  ENCOMPASS  provides  automated  support  for  all  aspects  of 
software  development  using  PLEASE.  We  believe  the  use  of  ENCOMPASS  will  enhance  the 
software  development  process. 

1.  Introduction 

It  is  both  difficult  and  expensive  to  produce  high-quality  software.  One  solution  to  this  problem  is 
the  use  of  software  engineering  environments  which  integrate  a number  of  tools,  methods,  and  data  struc- 
tures to  provide  support  for  program  development  and/or  maintenance[2,17,29,34,43,54,66,79,90,93- 
97,108,111].  The  SAGA  (Software  Automation,  Generation  and  Administration)  project  is  investigating 
both  the  formal  and  practical  aspects  of  providing  automated  support  for  the  full  range  of  software 
engineering  activities [10, 18— 21 , 49, 63, 98-100],  ENCOMPASS[98]  is  an  integrated  environment  being 


lAda  is  a trademark  of  the  US  Government,  Ada  Joint  Program  Office. 


1 


September  15,  1986 


Appendix  A 


DRAFT 


created  by  the  SAGA  project  to  support  the  incremental  development  of  software  using  the 
PLEASE[99,100]  executable  specification  language.  In  this  paper,  we  describe  the  architecture  of  ENCOM- 
PASS and  give  an  example  of  software  development  in  the  environment. 

A life-cycle  model  describes  the  sequence  of  distinct  stages  through  which  a software  product  passes 
during  its  lifetime[37].  There  is  no  single,  universally  accepted  model  of  the  software  life-cycle [3 ,8 , 1 3, 1 12]. 
The  stages  of  the  life-cycle  generate  software  components , such  as  code  written  in  programming  languages, 
test  data  or  results,  and  many  types  of  documentation.  In  many  models,  a specification  of  the  system  to 
be  built  is  created  early  in  the  life-cycle  (many  methods  for  specifying  software  have  been  pro- 
posed^,42, 46, 47, 60, 76, 82]).  As  components  are  produced,  they  are  verified[37]  for  correctness  with 
respect  to  their  specifications.  A specification  is  validated[ 37]  when  it  is  shown  to  correctly  state  the  custo- 
mers’ requirements. 

Producing  a valid  specification  is  a difficult  task.  The  users  of  the  system  may  not  really  know  what 
they  want,  and  they  may  be  unable  to  communicate  their  desires  to  the  development  team.  If  the 
specification  is  in  a formal  notation,  it  may  be  an  ineffective  medium  for  communication  with  the  custo- 
mers, but  natural  language  specifications  are  notoriously  ambiguous  and  incomplete.  Prototyping  and  the 
use  of  executable  specification  languages  have  been  suggested  as  partial  solutions  to  these  prob- 
lems^,41, 50, 61, 62, 65, 103, 113].  Providing  the  customers  with  prototypes  for  experimentation  and  evalua- 
tion early  in  the  development  process  may  increase  customer/developer  communication  and  enhance  the 
validation  and  design  processes. 

Even  given  a validated  specification,  it  may  be  difficult  to  determine  if  an  implementation  is  correct. 
Many  techniques  for  verifying  the  correctness  of  implementations  have  been  proposed.  For  example,  test- 
ing can  be  used  to  check  the  operation  of  an  implementation  on  a representative  set  of  input  data! 38,74]. 
In  a technical  review  process,  the  specification  and  implementation  are  inspected,  discussed  and  compared 
by  a group  of  knowledgeable  personnel[36,106].  If  the  specification  is  in  a suitable  notation,  formal 
methods  can  be  used  to  verify  the  correctness  of  an  implementation[48,51,52,58,73,109].  Many  feel  that  no 
one  technique  alone  can  insure  the  production  of  correct  software[31,32];  therefore,  methods  which  combine 
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a number  of  techniques  have  been  proposed  [86]. 

To  help  control  the  complexity  of  software  design  and  construction,  many  different  development 
methods  have  been  proposed[5,44,56,58,75,110j.  Many  of  these  methods  are  based  on  a model  of  the 
software  development  process;  they  combine  standard  representations,  intellectual  disciplines,  and  well 
defined  techniques  in  a unified  framework.  For  example,  it  has  been  suggested  that  that  the  development 
process  be  viewed  as  a sequence  of  transformations  between  different,  but  somehow  equivalent, 
specifications^, 7, 23, 70, 77, 83]. 

Others  have  suggested  that  modular  programming[ 81,101,104]  and  the  top-down  development  of  pro- 
grams^,44, 58, 107]  can  help  reduce  the  difficulty  of  program  construction  and  maintenance.  By  logically 
dividing  a monolithic  program  into  a number  of  modules,  we  reduce  the  knowledge  required  to  change 
fragments  of  the  system  and  decrease  the  apparent  complexity.  By  using  stepwise  refinement  to  create  a 
concrete  implementation  from  an  abstract  specification,  we  divide  the  decisions  necessary  for  an  implemen- 
tation into  smaller,  more  comprehensible  groups.  A number  of  modern  programming  languages  support 
modular  programming[30,69,72],  and  environments  to  support  such  methods  have  been  both  proposed  and 
construe  ted  [17, 93, 94,  111].  Methods  to  support  the  top-down  development  of  programs  have  been  both 
devised  and  put  into  use[l2,14,15,27,58,75,87,88]. 

The  Vienna  Development  Method  (VDM)  supports  the  top-down  development  of  software  specified 
in  a notation  suitable  for  formal  verification[ll,12,27,57-59,88].  In  this  method,  components  are  first  writ- 
ten in  a language  combining  elements  from  conventional  programming  languages  and  mathematics.  A 
procedure  or  function  may  be  specified  using  pre - and  post-conditions  written  in  predicate  logic;  similarly, 
a data  type  may  have  an  invariant . These  abstract  components  are  then  incrementally  refined  into  com- 
ponents in  an  implementation  language.  The  refinements  are  performed  one  at  a time,  and  each  is  verified 
before  another  is  applied;  therefore,  the  final  components  produced  by  the  development  satisfy  the  original 
specifications. 

PLEASE  is  a wide-spectrum,  executable  specification  language  which  supports  a development 
method  similar  to  VDM.  PLEASE  extends  its  underlying  implementation,  or  base,  language  so  that  a pro- 
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cedure  or  function  may  be  specified  with  pre-  and  post-conditions,  a data  type  may  have  an  invariant,  and 
an  implementation  may  be  completely  annotated.  At  present,  we  are  using  Adal30,105j  as  the  base 
language.  PLEASE  specifications  may  be  used  in  proofs  of  correctness;  they  also  may  be  transformed  into 
prototypes  which  use  Prolog[26,64]  to  “execute”  pre-  and  post-conditions,  and  may  interact  with  other 
modules  written  in  the  base  language.  We  believe  that  the  early  production  of  executable  prototypes  for 
experimentation  and  evaluation  will  enhance  the  software  development  process. 

ENCOMPASS  is  an  integrated  environment  being  constructed  by  the  SAGA  project  to  support  incre- 
mental software  development  using  PLEASE.  In  ENCOMPASS,  software  is  modeled  as  entities  which 
have  relationships  between  them.  These  entities  can  be  structured  into  complex  hierarchies  which  may  be 
seen  through  different  views.  The  configuration  management  system  stores  and  structures  the  components 
developed  and  used  in  a project,  as  well  as  providing  a mechanism  for  controlling  access.  The  project 
management  system  implements  a milestone-based  policy  using  the  mechanism  provided.  In  ENCOM- 
PASS, software  is  first  specified  using  a combination  of  natural  language  and  PLEASE.  Components 
specified  in  PLEASE  are  then  incrementally  refined  into  components  written  in  Ada;  this  process  can  be 
viewed  as  the  construction  of  a proof  in  the  Hoare  calculus[51,73].  Each  refinement  is  verified  before 
another  is  applied;  therefore,  the  final  components  produced  by  the  development  satisfy  the  original 
specifications.  ENCOMPASS  provides  automated  support  for  all  aspects  of  this  development  process. 

In  section  two  of  this  paper  we  describe  the  ENCOMPASS  environment,  both  its  architecture  and 
the  life-cycle  model  on  which  it  is  based.  In  section  three  we  describe  IDEAL,  the  programming-in-the- 
small  environment  used  within  ENCOMPASS,  and  in  section  four,  we  give  an  example  of  software 
development  using  ENCOMPASS.  In  section  five,  we  briefly  describe  the  current  status  of  the  system  and 
in  section  six,  we  summarize  the  support  ENCOMPASS  provides  for  incremental  software  development. 

2.  ENCOMPASS 

ENCOMPASS  is  designed  to  support  a particular  model  of  the  software  life-cycle;  this  is  basically 
Fairley’s  phased  or  waterfall  life-cycle [37],  extended  to  support  the  use  of  executable  specifications  and  the 
Vienna  Development  Method.  In  ENCOMPASS,  a development  passes  through  the  phases  planning, 
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requirements  definition,  validation,  refinement  and  system  integration. 

In  the  planning  phasey  the  problem  to  be  solved  is  defined  and  it  is  determined  if  a computer  solution 
is  feasible  and  cost  effective,  while  in  the  requirements  definition  phase,  the  functions  and  qualities  of  the 
software  to  be  produced  by  the  development  are  precisely  described[37].  In  ENCOMPASS,  software 
requirements  specifications  are  a combination  of  natural  language  documents  and  components  specified  in 
PLEASE.  Although  the  requirements  specification  describes  a software  system,  it  is  not  known  if  any  sys- 
tem which  satisfies  the  specification  will  satisfy  the  customers.  In  ENCOMPASS,  we  extend  Fairley’s 
phased  life-cycle  model  to  include  a separate  phase  for  customer  validation. 

The  validation  phase  attempts  to  show  that  any  system  which  satisfies  the  software  requirements 
specification  will  also  satisfy  the  customers,  that  is,  that  the  requirements  specification  is  valid.  If  not, 
then  the  requirements  specification  should  be  corrected  before  the  development  proceeds  to  the  costly 
phases  of  refinement  and  system  integration.  To  aid  in  the  validation  process,  the  PLEASE  components  in 
the  specification  may  be  transformed  into  executable  prototypes  which  satisfy  the  specification.  These  pro- 
totypes may  be  used  in  interactions  with  the  customers;  they  may  be  subjected  to  a series  of  tests,  be 
delivered  to  the  customers  for  experimentation  and  evaluation,  or  be  installed  for  production  use  on  a trial 
basis.  We  feel  the  use  of  prototypes  will  increase  customer/developer  communication  and  enhance  the 
validation  process. 

In  the  refinement  phase , the  PLEASE  specifications  are  incrementally  transformed  into  Ada  imple- 
mentations. The  refinement  phase  can  be  decomposed  into  a number  of  steps,  each  of  which  consists  of  a 
design  transformation  and  its  associated  verification  phase . The  design  transformation  may  produce  anno- 
tated components  in  the  base  language  as  well  as  an  updated  requirements  specification.  Components 
which  have  been  implemented  need  not  be  refined  further,  but  components  which  are  only  specified  will 
undergo  further  refinements  until  a complete  implementation  is  produced.  Each  design  transformation 
creates  a new  specification,  whose  relationship  to  the  original  is  unknown.  Before  further  refinements  are 
performed,  a verification  phase  must  show  that  any  implementation  which  satisfies  the  lower  level 
specification  will  also  satisfy  the  upper  level  one.  In  our  model,  this  is  accomplished  using  a combination 


5 


September  15,  1986 


Appendix  A 


DRAFT 


of  testing,  technical  review,  and  formal  verification. 

PLEASE  specifications  enhance  the  verification  of  system  components  using  either  testing  or  proof 
techniques.  The  specification  of  a component  can  be  transformed  into  a prototype;  this  prototype  may  be 
used  as  a test  oracle  against  which  the  implementation  can  be  compared.  Since  the  specification  is  formal, 
proof  techniques  may  be  used  which  range  from  a very  detailed,  completely  formal  proof  using  mechanical 
theorem  proving,  to  a development  “annotated”  with  unproven  verification  conditions.  ENCOMPASS  is 
an  environment  for  the  rigorou&{58\  development  of  programs.  Although  detailed  mechanical  proofs  are 
not  required  at  every  step,  the  framework  is  present  so  that  they  can  be  constructed  if  necessary.  Parts  of 
a project  may  use  detailed  mechanical  verification  while  other,  less  critical  parts  may  be  handled  using  less 
expensive  techniques. 

The  planning,  requirements  definition,  and  validation  phases  are  sequential  in  nature,  but  during  the 
refinement  phase,  some  tasks  may  be  performed  in  parallel.  For  example,  suppose  a specification  is  refined 
to  produce  a more  detailed  specification  which  contains  a number  of  independent  components.  These  com- 
ponents may  be  refined  concurrently  to  produce  more  detailed  specifications  and  finally  implementations. 
These  independently  developed  implementations  must  then  be  integrated  into  a complete  system.  In  the 
system  integration  pi iase,  separately  implemented  modules  are  integrated  into  successively  larger  units, 
each  of  which  is  shown  to  satisfy  the  specifications[37].  When  the  final  integration  has  been  performed,  the 
acceptance  tests  are  performed,  the  product  is  delivered  and  the  development  is  complete. 

In  ENCOMPASS,  a phase  may  contain  a sub-development  just  as  a development  contains  a number 
of  phases.  For  example,  if  a system  is  very  large  and  complex,  the  production  of  a prototype  in  the  valida- 
tion phase  may  in  itself  be  a complete  development.  If  the  system  is  composed  of  several  major  com- 
ponents, the  production  of  each  component  from  its  specification  during  the  refinement  phase  might  also 
be  considered  a complete  development.  By  dividing  the  development  process  into  small  steps  using 
hierarchical  composition,  ENCOMPASS  allows  each  step  to  be  smaller  and  more  comprehensible  and 
thereby  increases  management's  ability  to  trace  and  control  the  project. 
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2.1.  System  Architecture 

Figure  1 shows  the  top-level  architecture  of  ENCOMPASS.  The  user  accesses  and  modifies  com- 
ponents using  a set  of  software  development  tools . These  include  ISLET,  a language-oriented  editor  for  the 
construction  and  refinement  of  PLEASE  specifications,  and  Ted[49],  a proof  management  system  which  is 
interfaced  to  a number  of  theorem  provers.  The  configuration  management  system  structures  the  software 
components  developed  by  a project  and  stores  them  in  a project  data  base . The  configuration  management 
system  also  provides  a primative  form  of  software  capabilities  to  control  access  to  components.  The  pro- 
ject management  system  distributes  these  capabilities  to  implement  a management  by  objectives [45j 
approach  to  software  development;  each  phase  in  the  life-cycle  satisfies  an  objective  by  producing  a mile- 
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stone  which  can  be  recognized  by  the  system. 

Configuration  management  is  concerned  with  the  identification,  control,  auditing,  and  accounting  of 
components  produced  and  used  in  software  development  and  maintenance[l,8,9, 16*.  Configuration  control 
systems  and  models  of  software  configurations  have  been  suggested  as  aids  to  configuration  manage- 
ment^,35, 40, 53, 55, 67, 68, 71, 78, 89, 102, 114].  In  ENCOMPASS,  software  configurations  are  modeled  using 
variant  of  the  entity-relationship  model[ 24,25,80]  which  incorporates  the  concepts  of  aggregation  and  gen - 
e ra/iza  ho  n[9 1,92]. 

An  entity  is  a distinct,  named  component;  an  entity  may  have  attributes  which  describe  its  properties 
or  qualities.  Two  or  more  entities  may  have  a relationship  between  them;  a relationship  may  also  have 
attributes.  A group  of  entities  with  a relationship  between  them  may  be  abstracted  into  an  aggregate 
entity.  This  entity  would  have  entities  as  the  value  of  some  or  all  of  its  attributes.  A view  is  a mapping 
from  names  to  components.  A project  under  development  has  a unique  base  view  or  project  library  which 
describes  the  components  of  the  system  being  developed  and  the  primitive  relationships  between  them. 
Other  views  can  be  include  images  of  entities  in  this  base  view.  In  ENCOMPASS,  access  to  components  is 
controlled  through  the  use  of  views. 

The  project  management  system  is  organized  around  work  trays[l8],  which  provide  a mechanism  to 
manage  and  record  the  allocation,  progress,  and  completion  of  work  within  a software  development  pro- 
ject. In  ENCOMPASS,  each  user  may  have  a number  of  work  trays,  each  of  which  may  contain  a number 
of  tasks  that  contain  software  products . Project  libraries  are  one  type  of  task.  There  are  four  types  of 
trays:  input  trays , output  trays , in-progress  trays , and  file  trays . Each  user  receives  tasks  in  one  or  more 
input  trays.  The  user  may  then  transfer  these  tasks  to  an  in-progress  tray  where  he  will  perform  the 
actions  required  of  him  and  produce  new  products.  The  user  may  then  return  the  task  via  a conceptual 
output  tray  to  an  input  tray  for  the  originator  of  the  task.  A user  may  also  create  new  tasks  in  in- 
progress  trays  that  he  owns.  These  tasks  may  then  be  transferred  to  another  user’s  input  tray.  A task 
that  has  been  transferred  back  into  the  in-progress  tray  of  the  user  who  created  the  task  may  be  marked 
as  complete  and  transferred  to  a file  tray  for  long  term  storage. 


8 


September  15,  1985 


Appendix  A 


DRAFT 


3.  IDEAL 

ENCOMPASS  may  be  used  to  develop  programs  which  consist  of  many  interacting  modules;  in  this 
sense,  it  is  an  environment  for  programming-in-the-large|84,108|.  IDEAL  is  an  environment  concerned 
with  the  specification,  prototyping,  implementation  and  verification  of  single  modules;  it  is  the 
programming-in-the-small  environment  used  within  ENCOMPASS, 

Figure  2 shows  the  top-level  architecture  of  IDEAL,  which  contains  four  tools:  TED,  a proof 
management  system  which  is  interfaced  to  a number  of  theorem  provers;  ISLET  (Incredibly  Simple 
Language-oriented  Editing  Tool),  a prototype  program/proof  editor;  a tool  to  support  the  construction  of 
executable  prototypes  from  PLEASE  specifications;  and  a test  harness.  The  user  interacts  with  these  tools 
through  a common  interface.  The  tools  in  IDEAL  operate  on  components  which  are  stored  in  a module 
data  bast.  The  module  data  base  is  stored  as  part  of  a project  data  base  by  the  configuration  control  sys- 
tem; IDEAL  receives  a capability  to  the  module  data  base  from  the  project  management  system.  The 
module  data  base  contains  five  types  of  components:  symbol  tables,  proofs,  source  code,  load  modules  and 
test  cases. 

A set  of  symbol  tables  represent  the  PLEASE  specifications  and  Ada  programs  being  developed. 
These  symbol  tables  are  displayed  and  manipulated  by  ISLET,  a prototype  program/proof  editor.  ISLET 
can  be  used  to  create  PLEASE  specifications  and  incrementally  refine  them  into  Ada  programs;  this  pro- 
cess can  also  be  viewed  as  the  construction  of  a proof  in  the  Hoare  calculus[51,73j.  Some  steps  in  the  proof 
may  generate  verification  conditions  in  the  underlying  first-order  logic;  these  can  be  reformated  as  proofs 
which  serve  as  input  for  TED.  Using  TED,  the  user  can  structure  the  proof  into  a number  of  lemmas  and 
bring  in  pre-existing  theories. 

The  symbol  tables  also  serve  as  input  for  the  prototyping  tool,  which  uses  them  to  produce  execut- 
able prototypes  from  PLEASE  specifications.  The  source  code  for  the  prototypes  is  written  in  a combina- 
tion of  Prolog  and  Ada  and  utilizes  a number  of  run-time  support  routines  in  both  languages.  The  load 
modules  produced  from  both  prototypes  and  final  implementations  are  used  by  the  test  harness.  From  the 
test  harness,  the  user  can  invoke  commands  to  manipulate  test  cases.  Commands  are  available  to:  edit  or 
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browse  the  input  for  a test  case;  generate  output  for  a test  case;  or  run  a program  and  compare  the  results 
with  output  that  has  been  previously  checked  for  correctness. 

The  central  tool  in  IDEAL  is  ISLET.  It  not  only  manipulates  the  symbol  tables  representing 
specifications  and  implementations,  but  provides  a user  interface  and,  in  a sense,  controls  the  entire 
development  process. 
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3.1.  ISLET 

ISLET  supports  both  the  creation  of  PLEASE  specifications  and  their  incremental  refinement  into 
annotated  Ada  implementations.  This  process  can  be  viewed  in  two  ways:  as  the  development  of  a pro- 
gram, or  as  the  construction  of  a proof  in  the  Hoare  calculus[51,73].  The  refinement  process  consists  of  a 
number  of  atomic  transformations . From  the  program  view,  an  atomic  transformation  changes  an  unk- 
nown statement  into  a particular  language  construct;  from  the  proof  view,  an  atomic  transformation  adds 
another  step  to  an  incomplete  proof.  From  the  program  view,  defining  a predicate  adds  a new  construct  to 
the  program;  from  the  proof  view,  defining  a predicate  adds  new  axioms  to  the  first-order  theory  on  which 
the  proof  is  based. 

Figure  3 shows  the  architecture  of  ISLET.  The  user  interacts  with  ISLET  through  a simple 
language-oriented  editor  similar  to [85].  The  editor  provides  commands  to  add,  delete,  and  refine  con- 
structs; as  the  program/proof  is  incrementally  constructed,  the  syntax  and  semantics  are  constantly 
checked.  The  editor  also  controls  the  other  components:  an  algebraic  simplifier,  a number  of  simple  proof 
procedures,  and  an  interface  to  TED.  Many  steps  in  the  refinement  process  generate  verification  conditions 
in  the  underlying  first-order  logic.  These  verification  conditions  are  first  simplified  algebraically  and  then 
subjected  to  a number  of  simple  proof  tactics.  These  methods  can  handle  a large  percentage  of  the 
verification  conditions  generated.  If  a set  of  verification  conditions  can  not  be  proved  using  these  methods 
alone,  the  TED  interface  is  invoked  to  create  a proof  in  the  proper  format. 

TED  can  then  be  invoked  in  an  attempt  to  prove  the  verification  conditions.  Using  TED  is  very 
expensive,  both  in  system  resources  and  user  time;  however,  many  complex  theorems  can  be  proved  with 
its  aid.  The  algebraic  simplification  and  simple  proof  tactics  used  in  ISLET  are  very  inexpensive;  however, 
they  are  not  very  powerful.  The  combined  use  of  these  two  methods  supports  the  n$orous[58l  develop- 
ment of  programs.  Most  of  the  verification  conditions  will  be  proven  using  inexpensive  methods;  those 
that  are  expensive  to  verify  may  be  proven  immediately,  or  deferred  until  a later  time.  Parts  of  a system 
may  be  developed  using  completely  mechanical  methods,  while  other,  less  critical  parts  may  use  less  expen- 
sive techniques. 
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Figure  3.  Architecture  of  ISLET 


To  further  clarify  the  concepts  and  operation  of  ENCOMPASS  and  show  how  ENCOMPASS  can 
enhance  the  software  development  process,  we  will  consider  an  example  of  software  development.  We  will 
follow  the  development  from  receipt  of  the  assignment  by  the  team  leader  through  delivery  of  a verified 
and  validated  implementation. 

4.  An  Example  of  Software  Development 

For  our  example,  we  will  consider  a programming  team  consisting  of  a leader  and  two  programmers; 
there  is  a workspace  for  each  member  of  the  team.  The  team  leader’s  workspace  contains  output  trays  to 
send  assignments  to  the  each  of  the  programmers  as  well  as  an  input  tray  in  which  he  receives  completed 
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tasks.  Each  programmer’s  workspace  contains  an  input  tray  in  which  he  receives  assignments  from  the 
leader  and  an  output  tray  to  facilitate  the  return  of  assignments  to  their  originator.  Assume  that  the  team 
is  assigned  the  task  of  developing  a set  of  procedures  to  compute  simple  combinatoric  quantities.  The  sys- 
tem is  to  be  both  validated  by  prototyping  and  formally  verified.  It  will  contain  a procedure  to  calculate 
the  factorial  of  a number  as  well  as  a procedure  to  compute  the  number  of  unique  k-combinations  of  n 
items2. 

When  the  team  leader  receives  the  assingment  by  electronic  mail,  he  creates  a project  library  called 
kjcomb  in  his  in-progress  tray.  In  the  planning  phase,  the  team  leader  consults  with  the  customers  and 
creates  preliminary  copies  of  two  documents:  the  system  definition  and  projeet  plan . At  this  point,  it  is 
decided  that  the  system  will  consist  of  two  modules:  one  called  kjcomb  and  one  called  factorial.  The  team 
leader  creates  a program  object  containing  two  modules  with  these  names;  each  module  contains  an  empty 
symbol  table  and  set  of  test  cases.  The  team  leader  then  opens  the  factorial  module  and  uses  ISLET  to 
specify  the  procedure  factorial . 

Figure  4 shows  the  team  leader’s  screen  after  completing  the  specification  of  factorial.  The  large 
window  on  the  left  of  the  screen  gives  the  team  leader  access  to  his  workspace,  which  contains  the  trays  in, 
in_progrcss , out,  to_programmer^l , and  to_programmerJ2,  The  small  window  on  the  left  of  the  screen  is  to 
trap  console  messages  that  would  disrupt  the  display.  The  windows  on  the  right  of  the  screen  show  the 
hierarchy  of  components  through  which  the  team  leader  accessed  the  factorial  module.  First  the  team 
leader  opened  the  tray  in^progress  which  contains  the  project  library  for  the  kjcomb  task;  this  created  the 
window  on  the  bottom  of  the  stack  which  is  labeled  TRAYJTOOL . Next,  he  opened  the  project  library, 
creating  the  window  labeled  TASKJTOOL . He  then  opened  the  program  object  to  create  the  window 
labeled  PROGJTOOLy  and  finally  he  invoked  IDEAL  on  the  factorial  module  to  create  the  top  window  on 
the  stack. 

The  top  window  shows  the  PLEASE  specification  of  the  factorial  module.  This  specification  defines  a 
package  factorial , which  provides  a procedure  by  the  same  name.  In  PLEASE,  procedures  are  defined 

2The  number  of  k-combinations  is  equal  to  n!/(k!(a-k)t) 
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| MENU:  Cl  os.  t'Ed . OIsp,  Uet . Help.  List,  Open.  Put.  Quit.  Refine.  Undo.  j:e 


package  factorial  Is  ' 

— : predicate  is  factf  * inout  natural  ; / inout  natural  ) la  Vue  ' f 

— tl  natural  . 

— begin 

— : < » 0 and  y ■ 1 

Of 

ia^facttx  - 1.  tlj  and  y « 1 1 * * . 

— end  is^fact  : 

procedure  factorials  x in  natural  ; y out  natural  , 

— [ where  i n(.  true  ) and 
— | outk  is_f act(x , y ) ) , 


end  factorial 


Figure  4.  Team  Leader’s  Screen  After  Specifying  factorial 


using  pre-  and  post-conditions  which  are  designated  by  in(...)  and  out( ...)  respectively.  The  pre- 
condition for  a procedure  specifies  the  conditions  the  input  data  must  satisfy  before  procedure  execution 
begins.  The  pre-condition  for  factorial  is  true,  the  type  declarations  for  the  parameters  give  all  the 
requirements  for  the  input.  The  post-condition  for  a procedure  states  the  conditions  the  output  data  must 
satisfy  after  procedure  execution  has  completed.  The  post-condition  for  factorial  is  is^fact(x,y);  the  predi- 
cate is^fact  must  be  true  of  the  parameters  to  factorial  after  execution  is  complete. 
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The  predicate  isjact  is  not  pre-defined;  it  was  developed  by  the  team  leader  as  factorial  was 
specified.  In  PLEASE,  a predicate  syntactically  resembles  a procedure  and  may  contain  local  type,  vari- 
able, function  or  predicate  definitions.  At  present,  predicates  are  specified  using  Horn  clauses:  a subset  of 
predicate  logic  which  is  also  the  basis  for  Prolog[22,26].  This  simplifies  translation  from  PLEASE  to  Pro- 
log, but  limits  the  expressive  power  of  PLEASE.  The  predicate  iejact  states  that  x factorial  is  equal  to  y 
if  x equals  zero  and  y equals  one,  or  if  x minus  one  factorial  is  equal  to  tl  and  y equals  tl  times  x (in  other 
words,  isjact(z,y)  is  true  if  (x  = 0 A y = 1)  V ((x-l)!  = *l  A y = £l*x)). 

After  factorial  is  specified,  it  is  prototyped.  From  IDEAL,  the  team  leader  issues  a command  which 
automatically  creates  an  executable  prototype  from  the  PLEASE  specification.  This  prototype  is  compati- 
ble with  the  IDEAL  test  harness;  the  program  produced  reads  x from  input,  calls  factorial , and  then  writes 
y to  output.  From  the  test  harness,  input  data  can  be  edited,  the  prototype  can  be  used  to  generate  out- 
put, and  the  output  can  be  manually  checked  for  correctness.  The  team  leader  uses  these  tools  to  check 
that  the  factorial  prototype  performs  correctly  on  simple  test  data.  After  factorial  has  been  prototyped, 
the  specification  and  prototyping  processes  are  repeated  for  kjcomb , which  uses  factorial . 

After  both  modules  are  specified  and  prototyped,  the  validation  phase  begins.  The  prototype  system 
is  delivered  to  the  customers  for  evaluation;  it  is  subjected  to  a series  of  tests,  and  possibly  installed  for 
production  use  on  a trail  basis.  The  team  leader  consults  with  the  customers  to  produce  an  updated  set  of 
documents,  as  well  as  a set  of  acceptance  tesU[ 37]  which  will  be  used  to  evaluate  the  final  implementation. 
These  tests  are  stored  in  a form  compatible  with  the  IDEAL  test  harness;  the  implementation  can  be  run 
on  pre-existing  input  and  the  results  compared  with  those  produced  by  the  prototype.  After  the  valida- 
tion phase  is  complete,  the  refinement  phase  begins.  The  production  of  a verified  implementation  which 
passes  the  acceptance  tests  is  the  milestone  for  completion  of  this  phase. 

First,  the  implementation  task  is  decomposed  into  sub-tasks  that  can  be  performed  in  parallel.  It  is 
decided  that  the  implementation  of  factorial  will  be  performed  by  the  first  programmer,  while  k_comb  will 
be  implemented  by  the  second.  The  team  leader  creates  two  views  of  the  project  library;  both  provide 
access  to  all  the  documents  produced  in  the  development,  but  one  provides  access  to  factorial  while  the 
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other  provides  access  to  kjcomb.  The  team  leader  then  transfers  the  first  view  to  the  tray  labeled 
to^programmerj  in  his  workspace;  this  causes  the  view  to  appear  in  the  first  programmer’s  input  tray. 
Similarly,  the  second  view  is  sent  to  the  second  programmer. 

Figure  5 shows  the  team  leader’s  and  programmer’s  workspaces  after  the  transfers  are  complete. 
The  team  leader’s  workspace  contains  the  project  library,  which  contains  two  documents,  the  system 
definition  and  the  project  plan,  as  well  as  a program  object  containing  the  modules  factorial  and  kjcomb. 
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The  first  programmer’s  workspace  contains  the  first  view,  which  contains  an  image  of  the  system 
definition,  the  project  plan  and  factorial ; it  does  not  provide  access  to  k_comb . The  view  in  the  second 
programmer’s  workspace  is  similar,  but  gives  access  to  k__comb  and  not  factorial . 

When  the  first  programmer  checks  his  input  tray,  he  discovers  the  view  of  the  project  library;  he  can 
receive  more  information  by  electronic  mail  or  in  an  auxiliary  document.  He  then  opens  the  view,  the  pro- 
gram object,  and  the  factorial  module.  Using  ISLET,  the  programmer  then  refines  the  specification  of  fac- 
torial into  an  implementation.  As  the  refinement  is  performed,  verification  conditions  are  generated 
automatically.  As  the  project  plan  calls  for  a formally  verified  implementation,  the  verification  conditions 
are  mechanically  certified  as  the  refinement  is  performed. 

After  the  implementation  is  produced,  the  programmer  uses  the  test  harness  to  run  the  implementa- 
tion on  the  acceptance  tests  produced  in  the  validation  phase.  The  milestone  for  completion  of  his  assign- 
ment is  the  production  of  a formally  verified  implementation  which  passes  the  acceptance  tests.  When  the 
milestone  has  been  reached,  the  programmer  transfers  the  view  of  the  project  library  to  his  output  tray; 
this  causes  the  view  to  appear  in  the  team  leader’s  input  tray.  The  second  programmer  follows  a similar 
implement  and  verify,  test,  and  transfer  scenario  with  the  k_comb  module . 

When  the  team  leader  discovers  that  both  views  are  in  his  input  tray,  he  knows  the  project  should  be 
complete.  He  checks  to  be  sure  that  the  milestone  for  the  refinement  phase  has  been  reached;  using  tools  in 
ENCOMPASS,  he  certifies  that  the  implementations  are  formally  verified  and  pass  the  acceptance  tests. 
When  the  milestone  has  been  verified,  the  project  is  delivered  to  the  customers.  At  this  point  the  project  is 
complete,  and  can  be  transferred  to  a file  tray  for  long  term  storage. 

5*  System  Status 

The  SAGA  project  has  been  active  at  the  University  of  Illinois  at  Urbana-Champaign  for  over  five 
years.  ENCOMPASS  has  been  under  development  since  the  summer  of  1984;  a prototype  implementation 
has  been  operational  since  the  summer  of  1986,  This  prototype  includes  simple  implementations  of  the 
project  management  and  configuration  control  systems,  as  well  as  IDEAL.  It  is  written  in  a combination 
of  C,  Csh,  Prolog  and  Ada.  The  subset  of  PLEASE  currently  implemented  includes  the  if,  while,  and 
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assignment  statments,  as  well  as  procedure  calls  with  in,  out  or  in  out  parameters.  The  language  now  sup- 
ports a small,  fixed  set  of  types  including  natural  numbers,  lists,  booleans  and  characters.  ENCOMPASS 
has  been  used  to  develop  small  programs,  including  the  example  given  in  this  paper. 

ft.  Summary 

ENCOMPASS  is  an  integrated  environment  being  constructed  by  the  SAGA  project  to  support  incre- 
mental software  development  in  a manner  similar  to  the  Vienna  Development  Method.  In  ENCOMPASS, 
software  is  modeled  as  entities  which  may  have  relationships  between  them.  These  entities  can  be  struc- 
tured into  complex  hierarchies  which  may  be  seen  through  different  views.  The  configuration  management 
system  stores  and  structures  the  components  developed  and  used  in  a project,  as  well  as  providing  a 
mechanism  for  controlling  access.  The  project  management  system  implements  a milestone-based  policy 
using  the  mechanism  provided.  In  ENCOMPASS,  software  is  first  specified  using  a combination  of  natural 
language  and  PLEASE,  a wide-spectrum,  executable  specification  and  design  language.  Components 
specified  in  PLEASE  are  then  incrementally  refined  into  components  written  in  Ada;  this  process  can  be 
viewed  as  the  construction  of  a proof  in  the  Hoare  calculus.  Each  refinement  is  verified  before  another  is 
applied;  therefore,  the  final  components  produced  by  the  development  satisfy  the  original  specifications. 
PLEASE  specifications  may  be  used  in  formal  proofs  of  correctness;  they  may  also  be  transformed  into  exe- 
cutable prototypes  which  can  be  used  in  the  validation  and  design  processes.  ENCOMPASS  provides 
automated  support  for  all  aspects  of  software  development  using  PLEASE.  A prototype  implementation 
of  ENCOMPASS  has  been  constructed  at  the  University  of  Illinois  at  Urbana-Champaign.  We  believe  the 
use  of  ENCOMPASS  will  enhance  the  software  development  process. 
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Abstract 

PLEASE  is  an  executable  specification  language  which  supports  software  development  by  incre- 
mental refinement.  PLEASE  is  part  of  the  ENCOMPASS  environment  which  provides  automat- 
ed support  for  all  aspects  of  the  development  process.  Software  components  are  first  specified  us- 
ing a combination  of  conventional  programming  languages  and  predicate  logic.  These  abstract 
components  are  then  incrementally  refined  into  components  in  an  implementation  language. 
Each  refinement  is  verified  before  another  is  applied;  therefore,  the  final  components  produced  by 
the  development  satisfy  the  original  specifications.  PLEASE  allows  a procedure  or  function  to  be 
specified  using  pre-  and  post-conditions,  a data  type  to  have  an  invariant,  and  an  implementa- 
tion to  be  completely  annotated.  PLEASE  specifications  may  be  used  in  proofs  of  correctness; 
they  may  also  be  transformed  into  prototypes  which  use  Prolog  to  “execute”  pre-  and  post- 
conditions. We  believe  the  early  production  of  executable  prototypes  will  enhance  the  develop- 
ment process. 


1.  Introduction 

It  is  widely  acknowledged  that  producing  correct  software  is  both  difficult  and  expensive.  To  help 
remedy  this  situation,  many  methods  for  specifying[27,30,32,33,45,58,60]  and  verifying[34,38,39,43,54,74] 
software  have  been  developed.  The  SAGA  (Software  Automation,  Generation  and  Administration)  project 
is  investigating  both  the  formal  and  practical  aspects  of  providing  automated  support  for  the  full  range  of 
software  engineering  activities[8, 12-15, 35, 48, 67-69].  PLEASE  is  a language  being  developed  by  the  SAGA 
group  to  support  the  specification,  prototyping,  and  incremental  development  of  software  components [69], 
PLEASE  is  part  of  the  ENCOMPASS  environment  which  provides  support  for  all  aspects  of  the  software 
development  process[67,68j.  In  this  paper  we  describe  the  development  methodology  for  which  PLEASE 
was  created,  give  an  example  of  development  using  the  language,  and  describe  the  methods  used  to  pro- 
duce prototypes  from  PLEASE  specifications. 
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A life-cycle  model  describes  the  sequence  of  distinct  stages  through  which  a software  product  passes 
during  its  lifetime [25].  There  is  no  single,  universally  accepted  model  of  the  software  life— cycle[3,5,l  1 ,70]. 
The  stages  of  the  life-cycle  generate  software  components , such  as  code  written  in  programming  languages, 
test  data  or  results,  and  many  types  of  documentation.  In  many  models,  a specification  of  the  system  to 
be  built  is  created  early  in  the  life-cycle;  as  components  are  produced  they  are  verified[25]  for  correctness 
with  respect  to  this  specification.  The  specification  is  validated[25]  when  it  is  shown  to  satisfy  the  custo- 
mers requirements. 

Producing  a valid  specification  is  a difficult  task.  The  users  of  the  system  may  not  really  have 
defined  what  they  want,  and  they  may  be  unable  to  communicate  their  desires  to  the  development  team. 
If  the  specification  is  in  a formal  notation  it  may  be  an  ineffective  medium  for  communication  with  the 
customers,  but  natural  language  specifications  are  notoriously  ambiguous  and  incomplete.  Prototyping  and 
the  use  of  executable  specification  languages  have  been  suggested  as  partial  solutions  to  these  prob- 
lems^,28, 37, 46, 47, 51, 70, 77].  Providing  the  customers  with  prototypes  for  experimentation  and  evalua- 
tion early  in  the  development  process  should  increase  customer/developer  communication  and  enhance  the 
validation  and  design  processes. 

Even  given  a validated  specification,  it  may  be  difficult  to  determine  if  an  implementation  is  correct. 
Many  techniques  for  verifying  the  correctness  of  implementations  have  been  proposed.  For  example,  test- 
ing can  be  used  to  check  the  operation  of  an  implementation  on  a representative  set  of  input  data[26,56]. 
In  a technical  review  process,  the  specification  and  implementation  are  inspected,  discussed  and  compared 
by  a group  of  knowledgeable  personnel[24,72].  If  the  specification  is  in  a suitable  notation,  formal  methods 
can  also  be  used  to  verify  the  correctness  of  an  implementation[34,38,39,43,54,74].  Many  feel  that  no  one 
technique  alone  can  ensure  the  production  of  correct  software[22,23];  therefore,  methods  which  combine  a 
number  of  techniques  have  been  proposed[64]. 

To  help  manage  the  complexity  of  software  design  and  development,  methods  which  combine  stand- 
ard representations,  intellectual  disciplines,  and  well  defined  techniques  have  been  pro- 
posed^,31, 41, 43,57, 75].  For  example,  it  has  been  suggested  that  top-down  development  can  help  control 
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the  complexity  of  program  construction.  By  using  stepwise  refinement  to  create  a concrete  implementation 
from  an  abstract  specification,  we  divide  the  decisions  necessary  into  smaller,  more  comprehensible  groups. 
Others  have  suggested  that  the  development  process  be  viewed  as  a sequence  of  transformations  between 
different,  but  somehow  equivalent,  specifications^, 6, 17, 52, 59, 61], 

The  Vienna  Development  Method  (VDM)  supports  the  top-down  development  of  software  specified 
in  a notation  suitable  for  formal  verification[9, 10, 19,42-44,65].  In  this  method,  software  is  first  written  in 
a language  combining  elements  from  conventional  programming  languages  and  mathematics.  A procedure 
or  function  may  be  specified  using  pre-  and  post-conditions  written  in  predicate  logic;  similarly,  a data 
type  may  have  an  invariant.  These  abstract  components  are  then  incrementally  refined  into  components  in 
an  implementation  language.  The  refinements  are  performed  one  at  a time,  and  each  is  verified  before 
another  is  applied;  therefore,  the  final  components  produced  by  the  development  satisfy  the  original 
specification. 

PLEASE  is  a wide-spectrum,  executable  specification  language  which  supports  a development 
method  similar  to  VDM  for  software  written  in  a base  language ; at  present  we  are  using  Ada1  as  the  base 
language.  PLEASE  extends  the  base  language  so  that  a procedure  or  function  may  be  specified  with  pre- 
and  post— conditions,  a data  type  may  have  an  invariant,  and  an  implementation  may  be  completely  anno- 
tated. PLEASE  specifications  may  be  used  in  proofs  of  correctness;  they  also  may  be  transformed  into 
prototypes  which  use  Prolog(l8,50]  to  “execute”  pre—  and  post— conditions,  and  may  interact  with  other 
modules  written  in  the  base  language.  We  believe  that  the  early  production  of  executable  prototypes  for 
experimentation  and  evaluation  will  enhance  the  software  development  process. 

In  section  two  of  this  paper,  we  describe  the  development  methodology  PLEASE  was  designed  to 
support  and  in  section  three,  we  give  an  example  of  software  development  using  PLEASE.  First  we 
present  an  example  specification  and  describe  how  it  may  be  used  to  derive  an  executable  prototype.  Then 
we  show  a refinement  of  this  specification  and  discuss  the  process  of  verifying  that  the  refined  specification 
satisfies  the  original.  In  section  four,  we  give  an  example  of  data  type  specification  in  PLEASE  and  in  sec- 

lAda  is  a trademark  of  the  US  Government,  Ada  Joint  Program  Office. 
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tion  five,  we  describe  the  current  status  of  the  system.  In  section  six,  we  summarize  the  impact  of  using 
the  PLEASE  approach  in  software  development. 

2*  Incremental  Software  Development 

Figure  1 shows  the  life-cycle  model  PLEASE  was  designed  to  support;  different  perspectives  are 
given  in[13,67,68].  In  our  model,  a customer  requests  that  a system  be  constructed  by  the  development 
team.  In  the  requirements  definition  phase , the  functions  and  properties  of  the  software  to  be  produced  by 
the  development  are  determined[25].  A systems  analyst  produces  a software  requirement  specification [25j, 
which  precisely  describes  the  attributes  of  the  software  to  be  produced.  In  our  model,  software  require- 
ments specifications  include  components  specified  in  PLEASE.  PLEASE  specifications  describe  only  the 
function  of  a component,  not  its  performance,  robustness  or  reliability.  These  other  qualities  are  specified 
using  natural  language  or  other  formalisms. 

Although  a software  system  may  be  shown  to  meet  its  specification,  this  does  not  necessarily  imply 
that  the  system  satisfies  the  customers’  requirements.  The  validation  phase  attempts  to  show  that  any  sys- 
tem which  satisfies  the  specification  will  also  satisfy  the  customers’  requirements,  that  is,  that  the  require- 
ments specification  is  valid.  If  not,  then  the  requirements  specification  should  be  corrected  before  the 
development  proceeds  any  further.  In  this  phase  the  systems  analyst  interacts  with  the  users  to  produce 
the  system  validation  summary [67],  which  describes  the  customers’  evaluation  of  the  software  requirement 
specification. 

To  aid  in  the  validation  process,  the  PLEASE  components  in  the  specification  may  be  transformed 
into  executable  prototypes  which  satisfy  the  specification.  These  prototypes  may  be  used  in  interactions 
with  the  customers;  they  may  be  subjected  to  a series  of  tests,  be  delivered  to  the  customers  for  experimen- 
tation and  evaluation,  or  be  installed  for  production  use  on  a trial  basis.  The  use  of  prototypes  increases 
customer/developer  communication  and  enhances  the  validation  process.  If  it  is  found  that  the 
specification  does  not  satisfy  the  customers,  then  it  is  revised,  new  prototypes  are  produced,  and  the  vali- 
dation process  is  reinitiated;  this  cycle  is  repeated  until  a validated  specification  is  produced. 
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type  does  satisfy  the  customers  means  only  that  at  least  one  implementation  which  satisfies  the 
specification  is  acceptable.  For  example,  the  post-condition  for  a procedure  may  hold  true  for  an  infinite 
number  of  values  while  the  prototype  will  only  return  one.  We  say  the  specification  of  a component  is 
complete  if,  for  any  input  state,  it  is  satisfied  by  only  one  output  state.  Although  in  some  cases  it  is  possi- 
ble to  require  and  verify  that  the  specification  of  a component  is  complete,  this  is  difficult  in  practice.  We 
believe  that  while  prototypes  enhance  the  validation  process,  they  do  not  replace  communication  with  the 
customers  and  review  of  the  specification. 

When  the  validation  phase  is  complete,  the  specification  undergoes  a refinement,  or  design  transfor- 
mation, in  which  more  of  the  structure  of  the  system  is  defined  and  implemented.  This  phase  produces  a 
software  design  specification [25],  which  provides  a record  of  the  design  decisions  made  during  the  transfor- 
mation. During  the  transformation,  prototypes  produced  from  PLEASE  specifications  may  be  used  in 
experiments  performed  to  guide  the  design  process.  The  design  transformation  may  produce  annotated 
components  in  the  base  language  as  well  as  an  updated  requirements  specification.  Components  which 
have  been  implemented  need  not  be  refined  further,  but  components  which  are  only  specified  will  undergo 
further  refinements  until  a complete  implementation  is  produced. 

Although  a new  specification  has  been  created,  its  relationship  to  the  original  is  unknown.  Before 
further  refinements  are  performed,  a verification  phase  must  show  that  any  implementation  which  satisfies 
the  lower  level  specification  will  also  satisfy  the  upper  level  one.  In  our  model,  this  is  accomplished  using  a 
combination  of  testing,  technical  review,  and  formal  verification.  PLEASE  specifications  enhance  the 
verification  of  system  components  using  either  testing  or  proof  techniques.  The  specification  of  a com- 
ponent can  be  transformed  into  a prototype;  this  prototype  may  be  used  as  a test  oracle  against  which  the 
implementation  can  be  compared.  Since  the  specification  is  formal,  proof  techniques  may  be  used  which 
range  from  a very  detailed,  completely  formal  proof  using  mechanical  theorem  proving,  to  a development 
“annotated”  with  unproven  verification  conditions.  PLEASE  provides  a framework  for  the  rfgorous[43j 
development  of  programs.  Although  detailed  mechanical  proofs  are  not  required  at  every  step,  the  frame- 
work is  present  so  that  they  can  be  constructed  if  necessary.  Parts  of  a project  may  use  detailed  mechani- 
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cal  verification  while  other,  less  critical  parts  may  be  handled  using  less  expensive  techniques. 

The  life-cycle  supported  by  PLEASE  can  be  viewed  as  a sequence  of  transformations  between 
different  specification  levels.  On  level  one,  the  requirements  definition  phase  transforms  the  customers 
desires  into  an  initial,  abstract  specification.  Also  on  level  one,  the  correctness  of  this  transformation  is 
determined  by  the  validation  phase.  On  level  two,  the  specification  produced  on  level  one  undergoes  a 
design  transformation,  the  correctness  of  which  is  determined  by  a verification  phase.  All  the  remaining 
levels  take  the  specification  produced  by  the  next  higher  level  as  input,  and  transform  it  into  a more  con- 
crete form.  The  most  concrete  components  are  the  annotated  implementations,  which  are  produced  on  the 
lowest  level. 

A somewhat  more  complex  model  might  view  the  refinement  process  as  a search  through  a space  of 
possible  implementations.  A given  specification  can  have  a large  number  of  correct  implementations;  these 
can  be  structured  as  a tree.  In  this  tree,  each  interior  node  represents  a specification  and  each  leaf  node 
represents  a correct  implementation.  At  any  time,  the  development  is  located  at  a given  node.  A design 
decision  chooses  an  arc  which  leads  from  a specification  to  a new  specification  or  implementation.  The 
goal  of  the  refinement  process  is  to  search  this  tree  for  an  acceptable  implementation.  An  acceptable 
implementation  would  not  only  be  correct,  but  would  have  performance  and  other  characteristics  which 
satisfy  the  users.  In  an  actual  refinement,  some  paths  from  a given  specification  will  not  lead  to  acceptable 
implementations;  therefore,  the  refinement  process  may  have  to  backtrack  to  find  a solution.  If  an  imple- 
mentation is  found  inadequate,  design  decisions  must  be  undone  until  the  decision  which  caused  the  prob- 
lem has  been  reversed.  At  this  point  a correct  design  decision  can  be  made  and,  if  possible,  the  rest  of  the 
development  can  be  “replayed”  [73], 

In  our  model,  each  design  transformation  can  be  decomposed  into  a number  of  atomic  transforma- 
tions; if  each  atomic  transformation  is  correct  then  so  is  the  design  transformation.  Each  design  transfor- 
mation is  verified  before  another  is  applied;  this  allows  errors  in  the  specification  and  design  processes  to 
be  detected  and  corrected  sooner  and  at  lower  cost.  However,  a number  of  atomic  transformations  may  be 
performed  before  any  are  verified;  verifying  each  atomic  transformation  before  the  next  is  applied  would 
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be  prohibitively  expensive.  Instead,  the  information  necessary  to  verify  each  atomic  transformation  is 
recorded  for  use  in  the  corresponding  verification  phase;  at  that  time,  they  are  verified  using  an  appropri- 
ate method. 

To  clarify  our  model  further  and  show  how  PLEASE  specifications  enhance  the  development  pro- 
cess, we  will  consider  an  example  of  software  development.  We  will  follow  the  development  through 
requirements  definition,  validation  of  the  original  specification,  a single  design  transformation,  and 
verification  of  the  refinement. 

3.  An  Example  of  Software  Development 

Assume  that  a customer  needs  a component  which  sorts  a list  of  natural  numbers.  The  component 
should  take  a possibly  unsorted  list  as  input  and  produce  a sorted  list  which  is  a permutation  of  the  origi- 
nal as  output.  A pre-existing  component  implementing  lists  of  naturals  is  to  be  re-used.  In  the  require- 
ments definition  phase,  the  customer  discusses  his  needs  with  the  systems  analyst  and  a requirements 
specification  is  produced.  Along  with  other  documentation,  this  specification  might  contain  a component 
specified  in  PLEASE. 

3.1.  Specifying  a Component 

Figure  2 shows  the  PLEASE  specification  of  such  a component;  to  increase  readability  and  under- 
standability,  the  syntax  of  PLEASE/Ada  is  similar  to  Anna[22].  In  Ada,  packages  are  used  to  group  logi- 
cally related  components[21,7l].  The  specification  uses  the  pre-defined  package  NATURAL^LIST^PKG, 
which  uses  the  PLEASE  type  list  to  define  the  type  NATURAL JjIST  as  list  of  NATURAL . In  PLEASE, 
as  in  Lisp  or  Prolog,  lists  may  have  varying  lengths  and  there  is  no  explicit  allocation  or  release  of  storage; 
however,  the  strong  typing  of  Ada  is  retained  and  all  the  elements  of  a list  must  have  the  same  type.  In 
PLEASE,  as  in  Prolog,  the  empty  list  is  denoted  by  [] , and  a list  literal  is  denoted  by  [/]  , where  / is  a 
comma  separated  list  of  elements.  The  functions  hd , £/,  and  cons  have  their  usual  meanings  and  Lx  II  Le 
denotes  the  concatenation  of  the  elements  of  Lt  and  Le. 
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with  NATURAL_LIST_PKG  . use  NATURAL_LIST_PKG  . 
package  SQRT_PKG  is 

— predicate  PERMC  LI.  L2  in  out  NATURAL_LIST  ) is  true  if 

FRONT.  BACK  NATURAL_LIST  . 

— begin 

LI  = []  and  L2  = [] 
or 

Ll  = FRONT  M cons(hd(L2) , BACK)  and 
PERMCFRONT  II  BACK.  tl(L2)) 

--  end  . 

— . predicate  SORTED C L in  out  NATURAL_LIST  ) is  true  if 

— begin 

L = [] 
or 

1 1 CL)  = [] 
or 

hd(L)  <=  hd(tl(L))  and  SORTEDCtl CL) ) 

— end  , 

procedure  S0RT(  INPUT  in  NATURAL_LIST  , OUTPUT  out  NATURAL_LIST  ) 
— I where  m ( true  ) , 

—|  out ( PERM (INPUT, OUTPUT)  and  SORTED (OUTPUT)  ) , 

end  SORT  PKG  . 


Figure  2.  Specification  of  SORT^PKG 


The  specification  defines  a package  SORT^PKG  which  provides  a procedure  called  SORT.  The  pro- 
cedure takes  two  arguments:  the  first  is  a possibly  unsorted  input  list,  the  second  is  a sorted  list  produced 
as  output.  The  specification  defines  the  predicates  PERM  (permutation)  and  SORTED , as  well  as  giving 
pre-  and  post-conditions  for  the  procedure.  In  PLEASE,  the  pre-condition  for  a procedure  specifies  the 
conditions  that  the  input  must  meet  before  execution  begins,  while  the  post-condition  specifies  the  condi- 
tions that  the  output  must  meet  after  execution  has  completed.  In  the  specification,  the  state  before  execu- 
tion begins  is  denoted  by  in(..m)9  while  the  state  after  execution  has  completed  is  denoted  by  out(...).  For 
example,  the  pre-condition  for  SORT  is  simply  true ; the  type  declarations  for  the  parameters  give  all  the 
requirements  for  the  input.  The  post-condition  for  SORT  states  that  the  output  is  a permutation  of  the 


9 


Appendix  B 


input  and  the  output  is  sorted. 

In  PLEASE,  a predicate  syntactically  resembles  a procedure  and  may  contain  local  type,  variable, 
function  or  predicate  definitions.  For  example,  the  predicate  PERM  states  that  two  lists  are  permutations 
of  each  other  if  both  of  the  lists  are  empty,  or  if  the  first  element  in  the  second  list  is  in  the  first  list  and 
the  remainder  of  the  two  lists  are  permutations  of  each  other.  At  present,  predicates  are  specified  using 
Horn  clauses:  a subset  of  predicate  logic  which  is  also  the  basis  for  Prolog[l6,18j.  This  approach  allows  a 
simple  translation  from  predicate  definitions  into  Prolog  procedures;  however,  there  are  drawbacks.  For 
example,  in  pure  Horn  clause  programming  there  is  no  way  to  specify  the  falsehood  of  formulae;  for  exam- 
ple, the  fact  that  SORTED([2fl])  can  never  be  true.  The  solution  used  in  Prolog  is  the  closed  world 
assumption:  if  a fact  is  not  provably  true  then  it  is  assumed  to  be  false.  Unfortunately,  the  closed  world 
assumption  may  cause  inconsistencies  for  full  first-order  logic[62];  therefore,  at  present  there  is  no  way  to 
specify  negative  information  in  PLEASE.  Eventually,  we  plan  to  extend  PLEASE  to  support  a more 
powerful  logic. 

The  specification  contains  no  explicit  I/O  statements.  Currently,  all  I/O  is  handled  implicitly  by  the 
system;  a program  can  be  automatically  generated  which  reads  the  in  parameter  to  SORT  from  input,  exe- 
cutes the  procedure,  and  writes  the  out  parameter  to  output.  Although  this  approach  limits  PLEASE  to 
the  specification  of  programs  with  very  simple  I/O,  it  has  several  advantages:  specifications  without  expli- 
cit I/O  are  smaller  and  simpler  to  write;  omitting  the  sometimes  messy,  implementation  specific  details  of 
I/O  allows  specifications  to  be  more  abstract;  and  the  interaction  of  the  specification,  rapid  prototyping 
and  test  harness  capabilities  of  ENCOMPASS  is  greatly  simplified. 

After  the  requirements  specification  has  been  created,  it  must  be  validated.  The  systems  analyst  can 
discuss  the  specification  with  the  customer  and  obtain  test  data  and  expected  results  for  the  system.  The 
PLEASE  specification  can  then  be  used  to  produce  a prototype  which  satisfies  the  specification.  If  the  pro- 
totype performs  correctly  on  the  test  data  it  can  be  delivered  to  the  customer  for  evaluation.  If  the  proto- 
type does  not  perform  correctly  then  we  know  the  specification  is  invalid. 
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3.2.  Prototyping  the  Specification 

The  specification  in  Figure  2 can  be  automatically  translated  into  a prototype  written  in  a combina- 
tion of  Prolog  and  Ada.  Figure  3 shows  a simplified  version  of  the  Prolog  code  which  is  produced.  The 
predicates  PERM  and  SORTED  and  the  pre-  and  post-conditions  for  SORT  are  translated  into  Prolog 
procedures,  which  are  executed  by  an  interpreter.  When  the  SORT  procedure  is  called,  the  in  parameter  is 
converted  to  the  Prolog  representation  and  the  call  is  passed  to  the  interpreter.  When  the  Prolog  pro- 
cedure for  SOR  T completes,  the  out  parameter  is  converted  to  the  Ada  representation  and  the  original  call 
returns.  Tools  in  the  ENCOMPASS  environment  perform  the  translation  and  generate  code  to  handle  I/O 


perm(L1,L2)  <— 

eq(L1([]),eq(L2,  []). 
perm(Ll,L2)  «— 

eq(LbTemp3)t 

hd(L2,Temp1), 

cons(Templ)Ba-ckJTemp2), 

append(Front,Temp2,Temp3), 

append(Front, Back,  TempO, 

tl(L2fTemp5)J 

perm(Temp4, Temp5). 


sorted(L)  <■— 

^ eq(L,  [] ). 
sorted(L)  «— 

tl(L, TempO, 

eq(Templf  [] ). 
sorted(L)  «— 

hd(L, TempO, 
tl(L, Temp2), 
hd(Temp2jTemp3), 
lseq(TempllTemp3), 
tl(L, Tempj, 
sorted(TempO- 

sort_pre(Input, Output)  «—  true. 

sort_post(Input, Output)  perm( Input, Output), sorted( Output). 

sort(Input, Output)  «— 

sort_pre(  Input,  Output), 
sort_post(Input, Output) . 

Figure  3.  Prolog  Code  for  SORT  Procedure 
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and  other  implementation  level  details.  The  Prolog  procedure  for  SORT  simply  “executes”  the  pre-  and 
post-conditions. 

The  notion  of  execution  is  quite  different  for  pre-  and  post-conditions.  Executing  a pre-condition 
involves  checking  that  given  data  satisfies  a logical  expression.  Executing  a post-condition  means  finding 
data  that  satisfies  a logical  expression.  For  example,  the  post-condition  for  sort  must  find  a value  for  the 
output  list  such  that  the  input  and  output  are  permutations  of  each  other  and  the  output  is  sorted.  To 
accomplish  this,  the  Prolog  procedure  for  the  post-condition  performs  a naive  sort  of  the  input  list.  The 
Prolog  procedure  perm  functions  as  a “generator”  and  the  procedure  sorted  as  a “selector”.  When  the  pro- 
cedure for  the  post-condition  is  invoked,  perm  is  called  to  generate  a permutation  of  the  input  list  and 
then  sorted  is  called  to  determine  if  the  permutation  is  sorted.  If  sorted  fails,  then  execution  backtracks 
and  perm  generates  the  next  permutation  to  be  evaluated.  This  continues  until  a sorted  permutation  is 
generated.  The  performance  of  this  sorting  algorithm  is  quite  poor;  however,  it  can  be  improved  by 
transformation  techniques  applied  to  the  logical  formulae  involved [36,40]. 

Although  many  implementations  show  significant  deviations [66],  a “pure”  Prolog  interpreter  can  be 
viewed  as  a resolution  theorem  prover  for  Horn  clauses[l6,I8].  Using  this  model,  the  translation  from 
PLEASE  predicates  to  Prolog  code  is  simply  a sequence  of  transformations  between  equivalent  formulae. 
The  process  consists  of  four  steps.  First  the  predicates  are  syntactically  converted  to  the  logical  formulae 
they  represent.  Both  the  parameters  to  a predicate  and  its  local  variables  represent  universally  quantified 
logical  variables.  For  example,  the  predicate  PERM  in  Figure  2 represents  the  logical  formula: 

V L1,L2,Front,Back 
( perm(LlfL2)  «- 

Lj=  []  AL2=  [] 

V 

= append(Front,cons(hd(L2),Back))  a perm(append(Front,Back),tl(L2))  ) 

Next,  the  terms  on  the  right  hand  side  of  the  implication  are  unraveled  into  conjunctions  of  relations. 
This  is  necessary  because  Prolog  does  not  have  a good  notion  of  equality  (for  other  solutions  to  this  prob- 
lem see[29,49]).  We  assume  that  for  each  function  f(x),  there  exists  a relation  F(x,y)  such  that 
f(x)=y  iff  F(x,y).  Axioms  which  characterize  the  relation  F(x,y)  are  part  of  the  Prolog  run-time  library. 
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We  unravel  the  formula  P(..f(x)..)  into  the  equivalent  formula  3t  (F(x,t)  and  P(..t..)).  The  standard 
transformations  to  clause  form  are  then  used  to  convert  the  resultant  formulae  to  Prolog  procedures.  To 
continue  the  previous  example,  the  predicate  PERM  would  produce  the  Prolog  procedure: 

perm(L1,L2)  <— 

eq(Llf  C] ),  eq(L2,[]). 
perm(L1,L2)  <— 

hd(L2jTemp1), 

cons(Temp1,Ba-ck,Temp2), 

append(Front,Temp2,Temp3), 

eq(Ll,Temp3), 

append(Front,Back,Temp4), 

tl(L2,Temp5), 

perm(Temp4lTemp5). 

The  prototypes  produced  by  this  translation  process  are  partially  correct[ 54,55]  with  respect  to  the 
specifications.  In  other  words,  if  a prototype  terminates  normally  then  the  value  returned  will  satisfy  the 
post-condition.  A prototype  would  be  totally  correcf[54,55]  if  it  was  also  guaranteed  to  terminate  nor- 
mally. The  set  of  all  logically  valid  formulae  of  predicate  logic  is  not  decidable(54,55j;  therefore,  in  general 
it  is  not  possible  to  extend  our  approach  to  total  correctness.  Furthermore,  most  Prolog  implementations 
utilize  an  unbounded,  depth-first  search  strategy  which  makes  them  incomplete  as  theorem-provers; 
although  the  Prolog  procedures  produced  by  our  translation  process  have  the  proper  logical  properties, 
there  is  no  guarantee  that  they  will  terminate. 

In  the  last  step  of  the  translation  process,  a number  of  heuristic  transformations  are  used  on  the  Pro- 
log procedures  to  increase  the  chances  of  termination.  For  example,  the  heuristic  “move  all  equalities  to 
the  front  of  the  clause”  is  applied  to  the  procedure  perm  shown  above  to  get  the  final  Prolog  procedure 
shown  in  Figure  3.  To  understand  this  heuristic,  one  must  realize  that  the  eq  predicate  always  terminates. 
It  can  instantiate  one  of  its  arguments  and  thereby  increase  the  amount  of  “information”  available  to  sub- 
sequent procedures;  this  can  increase  the  chances  of  termination.  After  the  specification  for  SORT_PKG 
has  been  validated,  it  can  be  transformed  into  a more  concrete  form. 
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3.3.  Refining  the  Specification 

Assume  that  a decision  is  made  to  implement  the  sort  procedure  using  the  quicksort  algorithm.  As  a 
first  step,  the  original  specification  of  SORT^PKG  is  refined  so  that  SORT  implements  an  abstraction  of 
the  quicksort  algorithm.  Figure  4 shows  most  of  the  refined  specification.  SORTJ’KG  contains  three  pro- 
cedures which  are  called  by  SORT:  SELECT JSLMT,  PARTITION , and  COMBINE . SORT  has  the  same 
specification  as  before,  but  is  now  completely  implemented.  To  sort  the  input  list,  SELECT JELMT  is 
called  to  select  an  element  from  the  input  list  and  then  PARTITION  is  called  to  divide  the  list  into  two 
sublists,  LOW  and  HIGH , so  that  all  the  members  of  LOW  are  less  than  the  selected  element  and  all  the 
members  of  HIGH  are  greater.  The  lists  LOW  and  HIGH  are  then  sorted  recursively  and  COMBINE  is 
called  to  form  a sorted  permutation  of  the  input  from  the  sorted  sub-lists. 

The  body  of  SORT  is  completely  annotated;  in  other  words,  there  is  an  assertion  both  before  and 
after  each  executable  statement.  Each  assertion  states  the  conditions  which  must  be  satisfied  whenever 
execution  reaches  that  point  in  the  procedure.  The  assertions  plus  the  executable  statements  form  a proof 
in  the  Hoare  calculus[38,54,55j;  this  proof  was  incrementally  created  as  the  design  transformation  was  per- 
formed. Each  atomic  transformation  corresponds  to  a proof  step;  the  transformation  between  Figure  2 
and  Figure  4 corresponds  to  a proof  with  a number  of  steps.  Each  transformation  can  be  seen  from  either 
the  program  view  or  proof  view . For  example,  Figure  5 shows  the  first  step  in  the  refinement  of  the  SORT 
procedure  from  both  the  procedure  and  proof  views.  In  the  program  view,  an  atomic  transformation  takes 
an  incomplete  program  and  produces  a more  concrete  one;  in  the  proof  view,  an  atomic  transformation 
adds  another  step  to  an  incomplete  proof  tree.  For  more  discussion  on  the  relationship  of  proofs  and  pro- 
grams see  [7]. 

Although  this  refinement  has  narrowed  the  possible  implementations  to  those  using  the  quicksort 
algorithm,  there  are  still  many  design  decisions  left  unmade.  The  new  specification  may  be  refined  into  a 
family  of  quicksort  programs;  these  programs  might  differ  in  many  characteristics,  but  all  would  satisfy 
the  specification.  For  example,  the  specification  for  SELECT JSLMT  only  requires  that  ELMT  be  a 
member  of  LIST ; the  algorithm  used  to  select  a particular  element  is  not  specified  at  this  level  of  abstrac- 
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procedure  SELECT _ELMT ( LIST  m NATURAL_LIST  . ELMT  out  NATURAL  ) is  separate  . 

— I where  in(  LIST  /=  []  ),  out(  member (ELMT. LIST)  ) . 

— predicate  IS_PART(  LIST  in  out  NATURAL_LIST  . ELMT  in  out  NATURAL  . 

LOW.  HIGH  in  out  N ATURAL_L I ST  ) is  true  if 

— begin 

PERM(LIST , LOW  II  [ELMT]  II  HIGH)  and 
LSEQALL (LOW, ELMT)  and  GREQALL (HIGH , ELMT) 
end  . 

procedure  PARTITION!  LIST  m NATURAL_LIST  . ELMT  in  NATURAL  . 

LOW.  HIGH  out, NATURAL_LIST  ) is  separate  . 

— I where  m(  member(ELMT. LIST)  ).  out(  IS_PART(LIST . ELMT. LOW, HIGH)  ) 

procedure  COMBINE!  S0RTED_L  in  NATURAL_LIST  . ELMT  in  NATURAL  . 

S0RTED_H  m NATURAL_LIST  . LIST  out  NATURAL_LIST  ) is  separate 

—I  where  out(  LIST  = SORTED_L  II  [ELMT]  II  SORTED_H  ) . 

procedure  SORT!  INPUT  m NATURAL_LIST  . OUTPUT  out  NATURAL_LIST  ) is 

LOW.  HIGH,  SORTED_L . S0RTED_H  NATURAL_LIST  . ELMT  NATURAL  . 

begin  — SORT 
— I true  . 
if  INPUT  = □ then 

— I true  and  INPUT  = []  , 

OUTPUT  = []  . 

--I  PERM (INPUT. OUTPUT)  and  SORTED (OUTPUT)  . 

else 

--I  true  and  INPUT  /=  []  . 

S ELECT_ELMT ( I NP  UT , ELMT ) , 

—I  member (ELMT. INPUT)  . 

PARTITION(INPUT. ELMT, LOW. HIGH)  . 

—I  IS_P ART (INPUT, ELMT, LOW. HIGH)  . 

SORT (LOW . SORTED_L)  , 

--I  IS_PART( INPUT. ELMT, LOW, HIGH)  and  PERM (LOW . SORTED_L)  and  SORTED (S0RTED_L) 
SORT (HIGH . S0RTED_H)  . 

—I  IS_PART(INPUT. ELMT, LOW, HIGH)  and  PERM (LOW . SORTED_L)  and 
—I  SORTED (S0RTED_L)  and  PERM(HIGH . S0RTED_H)  and  SORTED (SORTED_H)  , 

COMBINE (S0RTED_L . ELMT , SORTED_H , OUTPUT)  . 

—I  PERM (INPUT. OUTPUT)  and  SORTED (OUTPUT)  . 
end  if  . 

—I  PERM(INPUT. OUTPUT)  and  SORTED (OUTPUT)  . 
end  SORT  . 


Figure  4.  Refinement  of  Sort  Specification 


tion.  Similarly,  the  specification  for  PARTITION  only  states  that  all  the  elements  in  LOW  are  less  than 
or  equal  to  ELMT  and  all  the  elements  in  HIGH  are  greater  than  or  equal  to  ELMT,  it  says  nothing  about 
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the  algorithm  used  to  produce  these  lists.  As  the  specification  is  refined  further  these  algorithms  will  be 
defined,  thereby  narrowing  the  acceptable  implementations.  However,  before  the  new  specification  is 
refined  further,  it  must  be  shown  that  any  implementation  which  satisfies  the  new  specification  will  also 
satisfy  the  original. 


Program  View 


Proof  View 


begin  — SORT 

— 1 true  . 

<unknown_l>  - 

— 1 PERM (INPUT . OUTPUT) 

— 1 and  SORTED  (OUTPUT) 

end  SORT  . 

= !p}  S,  !q} 

Where  p s true,  s unknown  _1, 

q s permutation(input, output) 
A sorted(output). 

Refine  unknownj.  into  an  if-then-else 

Instantiate  Sj  to  an  if-then-else  and 

and  generate  appropriate  assertions 

apply  proof  rule  for  conditional  statements 

begin  --  SORT 

— 1 true  . 

if  INPUT  = []  then 

— 1 true  and  INPUT  = []  , 

<unknown  2> 

— 1 PERM  (INPUT.  OUTPUT) 

{pAe}  S2  {q},  |pA->e}  S3  {q} 

--I  and  SORTED (OUTPUT)  . = 

= 

else 

{p}  if  e then  S2  else  S3  end  if  {q} 

— 1 true  and  INPUT  /=  []  . 

cunknown  3> 

— 1 PERM  (INPUT.  OUTPUT) 

Where  p = true, 

— 1 and  SORTED  (OUTPUT)  . 

q s permutation(input, output) 

end  if  . 

A sorted( output), 

— 1 PERM  (INPUT.  OUTPUT) 

e s input  = [] , 

— 1 and  SORTED  (OUTPUT)  , 

Sj  s unknown J. 

end  SORT  . 

Figure  5.  Refinement  as  Proof  Construction 
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3.4,  Verifying  the  Refinement 

A number  of  different  methods  may  be  used  to  show  that  the  refined  specification  satisfies  the  origi- 
nal. In  the  most  informal  case,  inspection  of  the  original  and  refined  specifications  by  a senior  designer,  or 
a peer  review  process  might  be  used.  A more  rigorous  approach  might  run  prototypes  produced  from  the 
original  and  refined  specifications  on  the  same  test  data  and  compare  the  results;  this  method  gives 
significant  assurance  at  low  cost.  However,  in  the  words  of  E.  W.  Dijkstra,  “Program  testing  can  be  used 
to  show  the  presence  of  bugs,  never  to  show  their  absence.”  In  the  most  rigorous  case,  mathematical  rea- 
soning would  be  used. 

In  ENCOMPASS,  the  refinement  process  is  viewed  as  the  incremental  construction  of  a proof  in  the 
Hoare  calculus[38,54,55j;  it  is  supported  by  ISLET [68] , a language  oriented  editor  similar  to [63] . ISLET 
provides  commands  to  add,  delete  and  refine  constructs;  as  the  specification  is  transformed  into  an  imple- 
mentation (and  the  proof  is  constructed)  the  syntax  and  semantics  are  constantly  checked.  Many  atomic 
transformations  will  generate  verification  conditions  in  the  underlying  first-order  logic.  These  are  algebra- 
ically simplified  and  then  subjected  to  a number  of  simple  proof  tactics.  If  these  fail,  input  is  generated  for 
TED,  a proof  management  system  that  is  interfaced  to  a number  of  theorem  provers[35]. 

The  use  of  general  purpose  theorem  provers  is  quite  expensive[l];  therefore,  proofs  using  TED  will 
usually  not  be  performed  during  a design  transformation.  Simple  methods  are  used  to  eliminate  trivial 
verification  conditions  as  they  are  generated;  verification  conditions  which  can  not  be  eliminated  by  these 
methods  are  recorded  by  ENCOMPASS  for  use  during  the  corresponding  verification  phase.  For  example, 
Figure  8 shows  the  verification  conditions  for  the  transformation  from  Figure  2 to  Figure  4 which  can  not 
be  proven  by  algebraic  simplification  and  simple  proof  tactics  alone;  out  of  eleven  refinements,  only  two 
generated  non-trivial  verification  conditions.  During  the  verification  phase,  these  non-trivial  formulae  can 
be  subjected  to  peer  review,  informal  proof,  or  mechanical  certification. 

When  all  the  atomic  transformations  have  been  verified,  the  design  transformation  is  known  to  be 
correct.  Once  the  design  transformation  has  been  verified,  the  new  specification  may  be  refined  further 
and  the  process  repeated  until  an  implementation  is  produced.  Although  this  example  shows  only  the 
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INPUT  = []  => 

PERM ( INPUT.  [])  and  SORTED([]) 


IS_PART(INPUT.ELMT. LOW, HIGH)  and 

PERMCLOW. SORTED  L)  and  SORTEDCSORTED  L)  and 
PERMCHIGH . S0RTED_H)  and  SORTED ( SORTED_H ) and 
LIST  = SORTED_L  ! I [ELMT]  I I SORTED  H => 

PERM (INPUT.  LIST)  and  SORTED (LIST) 


Figure  6.  Verification  Conditions  for  Refinement 


specification  of  a procedure,  PLEASE  may  also  be  used  to  specify  other  classes  of  components,  including 
data  types. 

4.  Specifying  Data  Types 

It  has  been  proposed  that  the  use  of  abstract  data  types  can  enhance  software  specification,  validation 
and  verification[30, 32,45,53,58].  For  example,  Figure  7 shows  the  PLEASE  specification  of  an  Ada  pack- 
age defining  the  type  NATURALjSTACK  to  provide  a stack  of  natural  numbers.  In  PLEASE,  a data  type 
has  another  type  as  its  representation ; for  example,  an  object  of  type  NATURALjSTACK  is  represented 
using  an  object  of  type  NATURAL  JjIS  T,  As  in  VDM[43],  a type  has  an  invariant  which  restricts  the  set 
of  legal  representations;  the  invariant  must  be  true  of  any  values  input  to,  or  output  from,  functions  on 
the  type.  For  example,  the  type  NATURALjSTACK  has  the  invariant  true  meaning  that  all  values  of 
type  NATURAL JjIST  can  be  interpreted  as  values  of  NATURALJSTACK. 

In  PLEASE,  the  functions  on  a data  type  are  specified  with  pre-  and  post-conditions  in  a manner 
similar  to  procedures.  For  example,  the  function  TOP  has  not(EMPTY)  as  a pre-condition;  the  function 
is  only  defined  on  stacks  with  at  least  one  element.  The  post-condition  for  TOP  states  that  the  value 
returned  by  the  function  is  the  head  of  the  list  given  as  an  argument.  The  pre-  post-conditions  for  a func- 
tion are  used  to  generate  axioms  which  characterize  its  behavior.  These  axioms  are  used  in  both  the  Pro- 
log prototypes  produced  from  specifications  and  in  the  proof  of  theorems  concerning  the  type. 
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with  NATURAL_LIST_PKG  . use  NATURAL_LIST_PKG  . 

package  NATURAL_STACK_PKG  is 

type  NATURAL_STACK  is  new  NATURAL_LIST  . 

—I  where  S NATURAL_STACK  =>  true  . 

function  EMPTY_STACK  return  NATURAL_STACK  . 

— I where  return  S NATURAL_STACK  =>  S = []  . 

function  EMPTY  return  BOOLEAN  . 

— I where  return  B BOOLEAN  =>  B = (S  = [] ) . 

function  PUSH ( E in  NATURAL  , S N ATUR AL_ST ACK  ) return  NATURAL_STACK 

--I  return  NS  NATURAL_STACK  =>  NS  = cons(E.S)  . 

function  POP ( S NATURAL_STACK  ) return  N ATUR AL_ST ACK  . 

--i  where  m(  not  (EMPTY)  ). 

--I  return  NS  N ATURAL_ST ACX  =>  NS  = tl(S)  . 

function  TOP ( S NATURAL_STACK  ) return  NATURAL  . 

— I where  m(  not(EMPTY)  ). 

— I return  E NATURAL  =>  E = hd(S) 

end  NATURAL  STACK  PKG 


Figure  7.  NATURALS  TACK  in  Terms  of  NATURALIST 


NATURAL  J5  TACK J^KG  defines  five  functions  on  the  type  NAT URALJS TACK.  The  function 
EMPTY_STACK  returns  an  empty  list  to  be  interpreted  as  an  empty  stack,  while  the  function  EMPTY 
determines  if  any  items  are  on  a stack.  The  function  PUSH  takes  a natural  number  and  a stack  as  input, 
and  returns  a new  stack  which  is  equal  to  the  old  stack  with  the  natural  number  on  top.  The  function 
POP  returns  a stack  with  the  top  element  removed,  while  the  function  TOP  returns  the  element  at  the  top 
of  the  stack.  NATURALjSTACKJ>KG  can  be  used  in  other  components  to  provide  a stack  of  natural 
numbers;  it  can  be  used  in  parameter  or  variable  declarations,  as  the  basis  for  new  type  definitions,  or  in 
the  specification  of  new  software  components. 
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5.  System  Status 

The  SAGA  project  has  been  active  at  the  University  of  Illinois  at  Urbana-Champaign  for  over  five 
years.  The  ENCOMPASS  environment  has  been  under  development  since  the  summer  of  1984.  A proto- 
type implementation  of  ENCOMPASS  has  been  operational  since  the  summer  of  1986;  it  is  written  in  a 
combination  of  C,  Csh,  Prolog  and  Ada.  This  prototype  includes  the  tools  necessary  to  support  software 
development  using  PLEASE:  an  initial  version  of  ISLET,  the  language-oriented  editor  used  to  create 
PLEASE  specifications  and  refine  them  into  Ada  implementations;  software  which  automatically  translates 
PLEASE  specifications  into  Prolog  procedures  and  generates  the  support  code  necessary  to  call  these  pro- 
cedures from  Ada;  the  run-time  support  routines  and  axiom  sets  for  a number  of  pre-defined  types;  and 
interfaces  to  the  ENCOMPASS  test  harness  and  TED.  The  subset  of  PLEASE  currently  implemented 
includes  the  if,  while , and  assignment  statements,  as  well  as  procedure  calls  with  in,  out  or  in  out  parame- 
ters. The  language  now  supports  a small,  fixed  set  of  types  including  natural  numbers,  lists,  booleans  and 
characters.  PLEASE  and  ENCOMPASS  have  been  used  to  develop  small  programs,  including 
specification,  prototyping,  and  mechanical  verification. 

ft.  Summary 

PLEASE  is  an  executable  specification  language  which  supports  program  development  by  incremen- 
tal refinement.  PLEASE  is  part  of  the  ENCOMPASS  environment  which  provides  automated  support  for 
all  aspects  of  the  software  development  process.  Software  components  are  first  specified  using  a combina- 
tion of  conventional  programming  languages  and  predicate  logic.  These  abstract  components  are  then 
incrementally  refined  into  components  in  an  implementation  language.  Each  refinement  is  verified  before 
another  is  applied;  therefore,  the  final  components  produced  by  the  development  satisfy  the  original 
specifications. 

PLEASE  specifications  can  be  transformed  into  prototypes  which  use  Prolog  to  “execute*’  pre-  and 
post-conditions.  We  believe  that  the  early  production  of  executable  prototypes  for  experimentation  and 
evaluation  will  enhance  the  development  process.  Prototypes  can  increase  the  communication  between  cus- 
tomer and  developer,  thereby  enhancing  the  validation  process.  Prototypes  produced  from  PLEASE 
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specifications  can  be  used  in  experiments  performed  to  guide  the  design  process.  Prototypes  produced  from 
a PLEASE  specification  and  its  refinement  can  be  run  on  the  same  test  data  and  the  results  compared;  this 
method  can  give  significant  assurance  that  a refinement  is  correct  at  a low  cost.  PLEASE  prototypes  are 
based  on  existing  Prolog  technology,  and  their  performance  will  improve  as  the  speed  of  Prolog  implemen- 
tations increases  (commercial  Prolog  compilers  which  produce  native  code  compatible  with  conventional 
languages  are  already  available[2j).  As  logic  programming  progresses,  new  versions  of  PLEASE  can  be 
built  based  on  more  powerful  logics.  We  believe  that  the  use  of  methods  similar  to  those  based  on 
PLEASE  specifications  will  enhance  the  design,  development,  validation  and  verification  of  software. 
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!•  INTRODUCTION 


The  cost  and  difficulty  of  producing  correct  software  are  well-known  problems  in  the  computer 
industry.  To  help  alleviate  these  problems,  methods  for  specifying[8,14,15,20,22,23]  and  verify- 
ing^, 11, 14, 21, 30]  software  have  been  developed.  The  SAGA  (Software  Automation,  Generation,  and 
Administration)  project[l,2,4,10,18,19,27]  is  investigating  the  formal  and  practical  aspects  of  providing 
automated  support  for  a broad  spectrum  of  software  engineering  activities.  The  PLEASE  language [28]  is 
being  developed  by  the  SAGA  group  to  support  the  specification,  prototyping,  and  rigorous  development 
of  software  components.  In  this  thesis,  I describe  a set  of  Prolog  run-time  support  libraries  for  PLEASE. 

Many  programming  methodologies  have  b.een  proposed  to  help  control  the  complexity  of  software 
design  and  development[l2,14,29,3l].  In  top-down  development  methods,  large  programming  problems  are 
decomposed  into  a number  of  smaller,  less  complex  problems.  Top-down  development  methodologies  have 
been  defined[l4,24]  and  implemented [20].  Using  stepwise  refinement^],  we  start  with  an  abstract 
specification  of  the  problem  and  iteratively  transform  it  into  a real  implementation;  thus,  the  necessary 
development  decisions  are  divided  into  smaller,  more  comprehensible  groups.  The  specification  is  a precise 
statement  of  the  function  of  the  system.  As  the  specification  is  incrementally  refined,  various  software 
components,  such  as  programs,  test  data,  and  various  types  of  documentation,  are  generated.  After  each 
iteration,  the  components  of  the  system  are  verified  for  correctness  with  respect  to  the  specification. 

Too  often,  systems  are  delivered  which  do  not  satisfy  their  users.  A specification  which  accurately 
reflects  the  desires  of  the  customer  is  difficult  to  produce.  We  say  a specification  is  validated  when  it  is 
shown  to  satisfy  the  customer’s  requirements.  A formal  specification  may  be  difficult  for  the  users  to 
understand.  It  is  easy  to  generate  an  informal  specification,  but  it  may  be  difficult  to  produce  a system 
from  a natural  language  description.  Prototfpin§[7\  and  executable  specification  language*[l$,l7 ,22,32] 
may  help  alleviate  these  problems.  Providing  prototypes  for  the  customers  to  use  and  evaluate  early  in  the 
development  process  may  increase  communication  between  the  customers  and  the  developers.  Once  a valid 
specification  is  produced,  a real  system  can  be  developed  from  it.  Testing  or  formal  verification  techniques 
may  be  used  to  show  that  an  implementation  meets  the  requirements  of  the  specification. 
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The  Vienna  Development  Method[l4,28]  is  an  example  of  a top-down  development  methodology. 
The  Vienna  Development  Method  (VDM)  contains  methods  for  the  formal  verification  of  system  com- 
ponents. In  VDM,  systems  are  specified  in  a language  which  combines  elements  of  conventional  program- 
ming languages  and  mathematics.  Prc-  and  post-conditions  written  in  predicate  logic  specify  procedures. 
Invariants  for  user-defined  data  types  are  logical  expressions  which  must  be  true  both  before  and  after  the 
execution  of  any  procedure  which  manipulates  the  data  type.  To  enhance  the  expressive  power  of 
specifications,  VDM  adds  the  data  types  list,  set,  and  map.  These  abstract  programs  are  incrementally 
refined  into  programs  coded  in  an  implementation  language.  Each  refinement  is  verified  for  correctness. 
Therefore,  the  final  program  produced  by  the  development  satisfies  the  original  specification. 

The  PLEASE  programming  language  is  designed  to  support  a methodology  similar  to  the  Vienna 
Development  Method.  In  PLEASE,  a procedure  or  function  may  be  specified  with  pre-  and  post- 
conditions written  in  predicate  logic,  and  a user-defined  data  type,  called  an  object,  may  have  an  invari- 
ant. PLEASE  specifications  may  be  used  in  proofs  of  correctness.  They  may  also  be  transformed  into 
Prolog [5]  prototypes.  PLEASE  specifications  may  interact  with  modules  written  in  conventional 
languages. 

In  section  2 of  this  thesis,  I describe  the  PLEASE  programming  language  in  more  detail,  giving  an 
example  PLEASE  specification  and  its  Prolog  prototype.  Section  3 contains  a description  of  the  run-time 
architecture  of  PLEASE.  The  representations  of  the  PLEASE  data  types  list,  set,  and  map  are  described 
in  section  4,  along  with  of  a description  of  the  input/output  support  library  and  some  miscellaneous  func- 
tions useful  in  prototype  development.  In  section  5,  I summarise  and  draw  some  conclusions  from  the 
work  done  for  this  thesis. 
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2.  THE  PLEASE  PROGRAMMING  LANGUAGE 


PLEASE  is  an  extension  to  the  programming  language  Path  Pascal [3],  which  is  an  extension  to 
standard  Pascal[13j.  In  Path  Pascal,  an  encapsulated  data  type,  called  an  object , defines  a block  which  fol- 
lows the  scope  rules  of  standard  Pascal.  The  object  definition  includes  declarations  of  local  variables 
which  are  only  accessible  by  procedures  and  functions  defined  within  the  object.  Entry  procedures  or  func- 
tions, called  operations,  may  be  called  from  within  the  scope  containing  the  object  declaration.  Objects 
provide  a facility  for  defining  encapsulated  data  types;  the  data  within  the  object  may  only  be  accessed  and 
manipulated  outside  the  object  definition  through  entry  operations.  In  Path  Pascal,  an  initialisation  pro- 
cedure (initially  block)  is  called  when  an  instance  of  the  object  is  created.  Path  Pascal  allows  asynchro- 
nous execution  of  program  structures  called  processes.  Processes  communicate  through  shared  data  struc- 
tures within  an  object.  Each  object  has  a path  expression  specifying  synchronisation  constraints  for  the 
processes,  functions,  and  procedures  within  the  object. 

In  PLEASE,  procedures  may  be  specified  with  pre-  and  post-conditions  written  in  predicate  logic. 
Pre-  and  post-conditions [21]  are  logical  expressions  specifying  conditions  which  must  be  true  when  a pro- 
cedure is  entered  and  exited.  The  pre-condition  for  a procedure  specifies  constraints  on  the  input  parame- 
ters and  global  variables  which  must  be  met  when  the  procedure  is  entered.  The  post-condition  must  be 
true  when  the  procedure  is  exited;  it  specifies  the  conditions  the  output  must  satisfy.  Data-type  invariants 
may  be  specified  for  objects.  The  data-type  invariant  is  a logical  expression  which  must  be  true  both 
before  and  after  an  object  is  modified.  In  other  words,  the  data-type  invariant  is  part  of  the  pre-  and 
post-condition  of  every  operation  on  an  object. 

Figure  1 shows  a PLEASE  specification  of  the  abstract  data  type  stack  of  integers.  In  the 
specification,  a stack  is  defined  as  a Path  Pascal  object.  The  operations  on  a stack  are  push,  pop , empty , 
and  top.  The  path  expression  for  the  stack  specifies  that  the  operations  may  be  performed  in  any  order. 

The  stack  in  the  example  is  represented  with  a list  of  integers.  A list  in  PLEASE  is  similar  to  a list 
in  LISP  or  Prolog.  It  is  an  ordered  sequence  of  elements,  all  of  which  are  of  a uniform  type.  The  list  has 
no  specified  length  and  may  grow  and  shrink  according  to  the  operations  performed  on  it.  The  basic 
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type  stack  = obj ect 


path  push,  pop,  top,  empty  end  ; 

var  s : list  of  integer  ; 

invariant  ; 

begin  true  end  ; 

entry  procedure  pushCelmt  ; integer)  ; 
precondition  ; 

begin  true  end  ; 

postcondition  i 

begin  s’  = < element  > I I s end  ; 
entry  procedure  pop  ; 

precondit,ion  i 

begin  true  end  ; 

postcondition  ; 

begin  s ’ = tl(s)  end  ; 
entry  function  top  : integer  ; 

precondition  i 

begin  not(empty)  end; 

post_condition  » 

begin  s ’ = s and  top*  = hd(s)  end; 
entry  function  empty  : boolean  ; 

precondition  J 

begin  true  end  ; 

post_condition  ; 

begin 

(empty'  = true  and  s = empty_list)  or 
(empty'  = false  and  s <>  empty_list) 

end  ; 


initially  ; 

precondition  ; 

begin  true  end  ; 

postcondition  i 

begin  s’  = empty_list  end  ; 


end  ; (*  stack  *) 


Figure  1,  Stack  of  Integers  in  Terms  of  list  of  integer 


operations  performed  on  a list  are  finding  its  head  or  tail,  appending  two  lists  to  form  a new  list,  and 
determining  if  a list  is  empty.  When  a stack  is  created,  the  initially  block  is  executed  and  the  list  is 
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initialised  with  an  empty  list.  New  elements  are  pushed  on  the  stack  by  inserting  them  at  the  front  of  the 
list.  Elements  are  popped  from  the  stack  by  removing  them  from  the  front  of  the  list. 

In  PLEASE,  the  notation  $ ; is  used  to  denote  the  value  of  the  variable  « after  the  procedure  is  exe- 
cuted. Lists  are  specified  in  PLEASE  by  enumerating  their  elements  between  the  symbols  "<"  and  ">H. 
The  notation  Mj[”  is  used  to  specify  the  concatenation  of  two  lists. 

This  PLEASE  specification  may  be  transformed  by  an  expert  prototyper  into  Prolog  procedures 
which  may  then  be  executed.  Prolog[5]  is  a programming  language  based  on  predicate  logic.  Prolog  pro- 
cedures are  goals  which  may  be  satisfied  by  a state-space  search.  Prolog’s  backtracking  mechanism 
automatically  searches  the  state-space  finding  any  or  all  possible  solutions  for  Prolog  goals.  Therefore, 
Prolog  procedures  may  be  used  both  to  check  whether  the  inputs  satisfy  the  pre-condition  and  to  find  the 
outputs  which  satisfy  the  post-condition. 

Figure  2 gives  the  Prolog  prototype  created  from  the  stack  object  specification.  The  pre-  and  post- 
conditions for  each  operation  in  the  stack  object  have  been  transformed  into  Prolog  procedures.  Each 
operation  is  performed  by  executing  the  corresponding  pre-  and  post-condition.  Note,  particularly,  the 
Prolog  procedure  for  the  function  top.  The  top  function  returns  the  element  on  the  top  of  the  stack.  The 
pre-condition  for  the  top  function  specifies  that  the  stack  must  not  be  empty  when  top  is  entered.  Execut- 
ing the  pre-condition  involves  checking  for  the  condition  when  the  function  is  called.  The  post-condition 
is  executed  just  before  the  function  returns.  In  the  top  procedure,  the  post-condition  unifies  the  return 
value  with  the  head  of  the  list  representing  the  stack.  There  are  a number  of  ways  a prototyping  expert 
may  code  the  pre-  and  post-conditions  in  Prolog.  In  this  example,  since  the  data  type  invariant  for  a 

u always  true,  the  expert  prototyper  has  not  included  it  in  the  prototypes.  Normally,  the  invariant 
would  be  checked  in  the  procedure  for  each  pre-  and  post-condition. 

The  stack  object  was  specified  with  the  PLEASE  data  type  list.  In  addition  to  lists,  PLEASE  defines 
the  data  types  set  and  map.  A PLEASE  set  is  an  unordered  collection  of  elements,  all  of  which  must  be  of 
the  same  type.  PLEASE  sets  are  not  multisets.  The  basic  operations  on  sets  are  determining  if  an  element 
is  a member  of  a set,  finding  the  union  or  intersection  of  two  sets,  and  determining  if  one  set  is  a subset  of 
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push_pre_condition  true. 
push_post_condition  (S,int (Elmt)  ,S_Prime) 
list_hd(S_Prime,int(Elmt) ), 
list_tl(S_Prinie,S). 
pushCS, int (Elmt),S-Prime) 

push_pre_condition, 

push_post  condition (S,int (Elmt)  ,S_Prime) . 

pop^pre^condltion  true. 
p°p_p°st_condition (S,S_Prime) 

list_tl  (S,S_Prime) . 

pop(S,S_Prime) 

pop_pre_conditionf 

pop_post_condition(S,S_Prime). 

top^pre^condition (S) 

empty  (S, false) . 
top_post_c°ndition(S,Top) 

list_hd(S,Top). 

top  (S,Top) 

top^pre^conditionCS), 
top-post_condition(SlT°p) . 

empty_pre_condition  true, 
empty  ^postcondition  (S,true) 
list_empty (S). 

empty_post_conditi°n(S,f  aise) . 
empty (Empty) 

empty_pre_concJition» 

empty_post_condition  (S,Einpty) . 

initiallycre_conc3itlon  true. 

initial ly_post_cond it ion (S_Prime) 
list_empty (S^Prime) . 

initially 

initially_pre_conditlont 

initiallycostCondition^s-Pri,ne^  * 


Figure  d.  Prolog  Prototype  for  Stack  Object 


another. 

A map  is  similar  to  a relation  in  mathematics.  It  is  a finite  set  of  domain  element  - range  element 
pairs.  For  each  element  in  the  domain  there  is  at  most  one  pair  in  the  map.  The  pairs  may  be  specified 
individually  or  by  a domain  set  and  a function  which  defines  the  corresponding  elements  in  the  range  set. 


6 


Adhering  to  the  strong  type  checking  of  Pascal,  all  elements  of  a set  (including  the  domain  and  range  sets 
of  a map)  must  be  of  a uniform  type. 

PLEASE  specifications  contain  pre-  and  post-conditions  written  in  predicate  logic  and  are  useful  in 
formal  proofs  of  correctness.  They  are  also  easily  transformed  into  executable  Prolog  prototypes.  The 
data  types  list,  set,  and  map  are  powerful  tools  for  data  abstraction  and 
systems.  PLEASE  specifications  are  incrementally  refined  into  source 
languages  such  as  C,  Pascal,  or  Ada. 


should  be  very  useful  in  specifying 
modules  coded  in  implementation 
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3.  RUN-TIME  ARCHITECTURE 

As  systems  are  refined  from  a specification  to  a real  implementation,  the  modules  specified  in 
PLEASE  will  be  expanded  into  routines  coded  in  various  implementation  languages  such  as  C,  Pascal,  and 
Ada.  Therefore,  there  will  be  modules  written  in  conventional  languages  and  modules  consisting  of 
PLEASE  prototypes  written  in  Prolog.  Since  we  do  not  have  a Prolog  compiler  with  an  interface  to  stand- 
ard implementation  languages,  we  must  be  able  to  link  object  modules  generated  from  conventional 
languages  to  Prolog  procedures. 

One  way  to  do  this  is  to  provide  a standard  text  interface  from  a conventional  language  to  Prolog. 
The  Prolog  code  is  encoded  as  text  in  implementation  language  source  modules  and  sent  to  a Prolog  inter- 
preter for  execution.  Parameters  to  the  Prolog  procedures  are  passed  to  the  module  containing  the  Prolog 
code,  converted  into  text,  and  placed  in  the  Prolog  interface  calls.  The  output  parameters  are  converted 
from  text  into  the  calling  language  representation  and  returned.  To  execute  Prolog  procedures  through  a 
text  interface  from  implementation  language  modules,  the  code  for  the  Prolog  procedures  must  be  asserted 


User 

Files 


Figure  3.  Interprocess  Communication  - Files  Manipulated  by  C 
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in  the  Prolog  data-base.  Then  procedure  "calls”  may  be  made  by  sending  commands  to  Prolog  to  execute 
the  code. 

The  PLEASE  run-time  architecture  provides  such  an  interface.  The  host  process  is  an  object  pro- 
gram created  from  various  source  modules  written  in  an  implementation  language.  A separate  process 
runs  the  UNSW  Prolog  interpreter [25].  Figure  3 illustrates  how  these  processes  communicate  through 
Unix1  pipes;  the  host  process  sends  commands  down  a pipe  to  the  Prolog  interpreter  which  returns  the 
results  through  another  pipe. 

Figure  4 illustrates  how  a C program  "calls"  Prolog.  The  ejto^plg  library  provides  the  standard  text 
interface  from  C language  modules  to  Prolog.  The  file  ”cjto^>lg.h”  is  included  to  make  all  the  necessary 
declarations  and  definitions  for  using  the  cJo_plg  library.  Prolog  code  is  stored  in  a P _3UF  and  sent  to 
the  Prolog  interpreter  with  cjto_plgucall.  A PJiUF  is  a C string,  up  to  4K-bytes  in  length,  and  may  be 
used  in  standard  C string  operations.  Since  these  PJUFs  are  C strings,  they  must  be  terminated  with  a 
”\0".  Another  PJUF  must  be  provided  to  receive  the  output  generated  by  Prolog. 

In  this  example,  the  C function  assert  adds  all  the  procedure  definitions  for  the  stack  object  to  the 
Prolog  data  base.  Once  these  definitions  hive  been  added,  they  remain  until  the  end  of  execution;  there- 
fore, the  definitions  only  have  to  be  asserted  once.  In  the  test  function,  two  calls  are  being  made  to  the 
procedures  in  the  stack  object.  The  first  call  pushes  the  integer  ”3”  onto  the  stack.  The  second  call  uses 
the  top  function  to  see  if  it  was  pushed  properly.  The  first  parameter  in  c_to_plgjeall  is  the  input  buffer. 
The  second  buffer  receives  Prolog’s  output.  When  this  program  is  run,  it  outputs  ”X=int(3)”. 

When  cjojplgjcall  is  invoked,  the  input  string  is  sent  through  a pipe  to  Prolog’s  standard  input. 
The  first  time  eJLo_plg_caU  is  executed,  it  starts  the  Prolog  process  and  sets  up  the  necessary  interprocess 
communication  channels.  Prolog  executes  the  instructions  received  on  its  standard  input  and  writes  the 
output  onto  its  standard  output,  which  is  piped  back  to  the  calling  process.  The  eJto_plgucall  function 
should  return  when  Prolog  has  written  all  its  output.  Since  the  Prolog  interpreter  does  not  flush  its  output 

1 Unix  is  a trademark  of  AT&T  Bell  Laboratories 
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#include  <stdio.h> 
#include  "c_to_plg.h" 


assertO  /*  assert  definitions  of  stack  object  */ 

{ 

P_BUF  inbuf  ; /*  Prolog  input  buffer  */ 

P_BUF  outbuf  ; /*  Prolog  output  buffer  */ 

sprintf  (inbuf  ,"%s  %e  %s  %s  %s  %s  %sH, 

"push_pre_condition  true.  " , 
,,push_post_condition(S>int(Elrat)lS_Prime)  :-H  , 

H list_hd(S_Primelint(Elmt))l  M , 

M list_tl (S_Prime,S) , " , 

,lpush(S,int(Elmt)lS_Pnme)  M , 

" push_pre_condition,  " f 

" push_post_condition(Slint(Elmt),S_Prime).  " ) ; 
c_to_plg_call(inbuf foutbuf ) ; 

/*  rest  of  the  code  for  the  stack  would  also  be  asserted  */ 

> 


testO  /*  test  push  and  top  */ 

< 

P_BUF  inbuf  ; /*  Prolog  input  buffer  */ 

P_BUF  outbuf  ; /*  Prolog  output  buffer  */ 

sprintf  (inbuf, "pushCS, int(3)JS_Prime) ! ")  ; 
c_to_plg_call  (inbuf , outbuf ) ; 

sprintf  (inbuf top (S,X)?"); 
c_t0_P18_ca11  (inbuf, outbuf ) ; 
printf  ("%s", outbuf ) ; 

> 


Figure  4.  Excerpt  from  C Program  Testing  Stack  of  Integers 


pipe  when  it  has  finished  writing,  the  calling  routine  must  tell  it  when  to  do  so.  C_to_plg_call  sends  a flush 
command  to  Prolog  after  the  user’s  input  is  sent  down  the  pipe.  When  the  user  instructions  have  been 
executed,  the  flush  command  causes  all  output  to  be  sent  to  the  calling  process  by  the  operating  system. 
C_to_plg_call  assembles  the  output  from  Prolog  into  the  the  output  buffer  and  returns  when  Prolog  has 
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completed  the  flush. 


The  run-time  architecture  places  some  restrictions  on  the  way  Prolog  modules  interact  with  the  file 
system.  Prolog’s  standard  input  and  standard  output  are  used  for  interprocess  communication;  therefore, 
they  may  not  be  used  for  file  access.  In  Unix,  each  process  gets  a unique  file  descriptor  for  a file.  There- 
fore, separate  processes  writing  to  the  same  file  may  overwrite  one  another’s  changes.  All  file  processing 
may  be  done  from  either  the  implementation  language  modules  or  the  Prolog  modules,  but  because  of  the 
danger  of  processes  overwriting  one  another’s  files,  file  processing  should  not  be  mixed  between  Prolog 
modules  and  implementation  language  modules.  Figure  5 illustrates  that  all  file  processing  is  done  from 
the  Prolog  modules.  A library  of  Pascal-like  file  manipulation  routines  is  provided  for  Prolog. 

The  ptoplg  library  provides  a Prolog  interface  for  Pascal  and  Path  Pascal.  Since  standard  Pascal 
does  not  support  strings,  the  type  plgbuf  and  operations  on  it  are  defined.  Figure  8 shows  how  a Pascal 
program  might  execute  Prolog  code  to  test  the  stack  specification.  The  file  "ptoplg.h"  contains  the 
definitions  necessary  for  using  the  ptoplg  library.  A plgbuf  is  a 4K-byte  Pascal  string.  A plgbuf  is 


User 

Files 


Figure  5.  Interprocess  Communication  - Files  Manipulated  by  Prolog 


#include  "ptoplg  h" 

procedure  test; 
var  i,o  : plgbuf  ; 
begin 

plgbuf init Cl)  ; 

plgbuf  append  Cif  'push(S,int  (3)  ,S_Prime)  \ $*)  ; 
ptoplgcall(i,o)  ; 

plgbuf init(i) ; 

plgbuf appendCi,  *top(S,X)?  $’)  ; 
ptoplgcall (i,o)  ; 
plgbuf write (o)  ; 


end; 


Figure  G.  Calling  Prolog  from  Path  Pascal 


initialised  and  cleared  with  plgbufintt  Strings  are  appended  to  the  existing  contents  of  a plgbuf  with 
plgbufappcnd.  These  strings  must  be  terminated  with  a Ptoplgcall  works  in  the  same  way  as 

cJlo_plg_call  and  is,  in  fact,  implemented  with  cjto^plg^call.  Plgbufwrite  prints  the  contents  of  a plgbuf  on 
standard  output.  The  output  produced  by  this  procedure  is  "X=int(3)". 

In  order  to  support  the  data  types  list,  set,  and  map  defined  in  the  Vienna  Development  Method, 
standard  Prolog  representations  of  these  types  were  defined  and  libraries  of  procedures  were  developed  to 
support  these  representations.  In  addition,  a set  of  file  input/output  procedures  based  on  those  provided 
by  standard  Pascal  were  defined  to  supplement  the  Prolog  file  input/output  model.  A library  of  miscel- 
laneous procedures  useful  in  debugging  and  making  standard  definitions  was  also  developed.  These 
libraries  are  loaded  automatically  when  the  Prolog  interpreter  is  invoked. 
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4.  PROLOG  SUPPORT  LIBRARIES 


The  Prolog  support  libraries  use  a standard  set  of  data  representations.  In  the  libraries,  instances  of 
PLEASE  data  types  are  represented  as  Prolog  terms  of  the  form:  "<data_type>  (Value)".  For  example, 
the  integer  3 would  be  represented  ”int(3)".  The  Prolog  term  is  the  most  convenient  way  of  representing 
structured  information  and  it  is  useful  to  have  the  data  type  as  the  principal  functor  of  the  Prolog  term 
that  is  representing  an  instance  of  a data  type.  This  type  information  can  be  used  as  a selector  in  over- 
loaded functions  (such  as  a generic  pretty-printing  procedure).  The  type  information  is  also  needed  for 
making  the  appropriate  conversions  of  text  output  from  Prolog  into  representations  for  other  languages. 

4.1.  Prolog  Representation  of  Lists,  Arrays,  Sets,  and  Maps 

Since  the  list  is  the  basic  data  structuring  tool  in  Prolog,  we  represent  all  PLEASE  data  types  in 
terms  of  the  Prolog  list.  In  Prolog,  a list  is  an  ordered  sequence  of  elements.  The  first  element  in  a list  is 
called  the  head  of  the  list.  The  tail  of  a list  is  the  remainder  of  a list  after  its  head  is  removed. 

A PLEASE  list  is  represented  in  Prolog  as: 
list(  [Element,:.., Element]  ) 

where  all  Elements  are  instances  of  some  PLEASE  data  type.  The  library  of  list  routines  includes  pro- 
cedures to  create  an  empty  list,  find  the  head  and  tail  of  a list,  append  two  lists,  determine  if  two  lists  are 
equal,  find  the  union  or  intersection  of  two  lists,  and  various  operations  to  index  the  elements  of  a list.  All 
operations  on  arrays,  sets,  and  maps  are  defined  in  terms  of  the  list  operations.  Therefore,  any  increase  in 
the  efficiency  of  the  list  operations  will  improve  the  performance  of  the  operations  on  other  data  types. 

The  array  is  the  principal  built-in  data  structure  of  Pascal.  A single  dimensioned  array  is  provided 

as  a data  type  in  PLEASE.  An  instance  of  an  array  includes  its  lower  and  upper  bounds  as  well  as  the 
items  in  the  array: 

array ( int(Lowerbound),  int(Upperbound),  list(  [Element,...,Element])  ) 
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The  element*  of  an  array  may  be  instances  of  any  PLEASE  data  type.  The  library  of  array  routines 
includes  procedures  for  determining  the  size  of  an  array,  accessing  the  individual  elements  of  an  array, 
checking  that  two  arrays  are  equal,  and  modifying  the  contents  of  an  array. 

A set  is  also  implemented  with  a PLEASE  list.  The  order  of  elements  in  a set  is  not  preserved  by  the 
operations  in  the  set  library.  There  are  two  set  representations. 

set(  list(  [Element,. ..,Element]  ) ) 

is  an  instance  of  an  enumerated  set.  As  with  lists  and  arrays,  the  elements  of  the  set  may  be  instances  of 
any  PLEASE  data  type.  All  the  operations  in  the  set  library  manipulate  instances  of  enumerated  sets. 
There  are  operations  to  insert  and  remove  elements  from  a set,  find  all  the  members  of  a set,  take  the 
union,  intersection,  or  difference  of  two  sets,  create  an  empty  set,  and  determine  if  two  sets  are  equal.  The 
sets  {4,3}  and  {3,4}  are  equal  and  the  equality  routine  will  verify  that  two  mathematically  equivalent  sets 
are  equal,  even  if  the  elements  are  not  enumerated  in  the  same  order.  There  is  also  a procedure  for  deter- 
mining if  one  set  is  the  subset  of  another. 

For  convenience,  a second  representation  of  sets,  called  a concise  representation  is  provided.  Many 
large  sets  are  much  too  tedious  to  type  in  at  a terminal  (for  example,  imagine  typing  in  the  set  of  integers 
from  one  to  a hundred,  or  a thousand).  These  sets  may  be  defined  with  a "low"  or  "first"  element,  a "high" 
or  "last"  element,  and  two  functions,  one  to  generate  the  successor  to  an  element  of  a set,  and  one  to  deter- 
mine when  two  elements  of  the  set  are  equal.  For  example, 


setc(  int(l),  int(lOO),  int_next(_,_),  int_equal(_*_) ) 
together  with 


intjiext(  int(X),  int(XPlusl)  ) 
XPlusl  is  X + 1. 
intjequal(  int(X),  int(X) ). 
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is  a full  specification  for  the  set  of  integers  from  1 to  100.  The  next  function  and  equal  function  must  be 
asserted  in  the  Prolog  data  base2.  Note  that  the  data  type  for  an  instance  of  a concise  set  is  setc.  Also, 
note  that  the  full  procedure  head  for  the  next  and  equal  functions  are  used  in  the  representation  and  that 
the  variables  in  the  procedure  heads  are  specified  with  underbars.  To  convert  this  concise  representation 
to  a standard  enumerated  set,  call  the  setjransform  procedure  in  the  library  of  set  operations  with  the 
sete  term  as  the  first  argument.  The  second  argument  should  be  a variable  and  will  be  unified  with  a com- 
pletely instantiated  enumerated  set.  Due  to  restrictions  imposed  by  the  Prolog  interpreter  currently  in 
use,  it  is  unwise  to  have  sets  with  more  than  about  100  elements. 

PLEASE  maps  also  have  two  representations.  The  standard  representation  is  a list  of  ordered  pairs: 

map(  list(  [ pair(Element, Element),  ...,  pair(Element, Element)]  ) ) 

where  each  element  is,  again,  an  instance  of  any  PLEASE  data  type.  The  first  element  in  each  pair  is  an 
element  of  the  domain  set  and  the  second  element  of  each  pair  is  an  element  of  the  range  set.  All  elements 
of  the  domain  set  should  be  of  the  same  type  and  all  elements  of  the  range  set  should  be  of  the  same  type. 
The  elements  in  the  pair  do  not  have  to  be  related  in  any  way  but  the  procedures  in  the  map  library 
assume  that  for  any  element  in  the  domain  set,  there  is  only  one  corresponding  element  in  the  range  set. 
There  are  procedures  in  the  map  library  for  finding  the  domain  and  range  sets  of  a map,  inserting  a pair  in 
the  map,  finding  a range  element  given  a domain  element,  and  changing  or  removing  a pair  in  the  map. 
There  is  also  a procedure  to  transform  a set  into  a map  when  a function  is  provided  to  take  domain  ele- 
ments and  produce  their  corresponding  range  elements. 

The  concise  representation  for  maps  is  very  similar  to  the  concise  representations  for  sets.  In  addi- 
tion to  the  definitions  for  the  next  function  and  the  equality  predicate,  a function  definition  must  be  pro- 
vided for  the  mapping  of  domain  elements  to  range  elements.  For  example, 


* The  successor  and  equality  functions  for  integers  are  defined  in  the  library  of  miscellaneous  procedures  (see  Ap- 
pendix A). 
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mapc(  int(l),  int(20),  nextjnt  equaUnt(^,_),  square^.)  ) 

square(  int(X),  int(XSquared)  ) 

XSquared  is  X * X . 

is  the  concise  representation  for  the  mapping  from  integers  in  the  range  one  to  twenty  to  their  correspond- 
ing squares.  The  function  mapjtransform  in  the  map  library  takes  a concise  map  as  the  first  argument  and 
returns  the  corresponding  standard  map  in  the  second  argument. 

4*2.  Procedure  Classification 

The  procedures  in  the  Prolog  libraries  may  be  classified  as  functions , generators , or  predicates.  A 
standard  Prolog  function  returns  one  or  more  values  when  given  one  or  more  inputs.  A generator  takes 
one  or  more  non-variable  arguments  and  successively  unifies  the  other  argument(s)  with  all  the  possible 
values  that  satisfy  the  conditions.  For  example,  set _membcr,  a procedure  in  the  set  library,  may  be  used 
as  a generator.  Set_memberf  when  used  as  a generator,  takes  a set  as  the  first  argument  and  a variable  as 
the  second  argument.  Prolog  will  successively  unify  the  second  argument  with  each  element  of  the  set  dur- 
ing backtracking.  For  example,  the  query 

setjnember(  set(list([int(l),int(7),int(4),int(5)])),  X ) T 

will  yield  the  following  output: 

X=int(l) 

X=int(7) 

X=int(4) 

X=int(5). 

A procedure  is  used  as  a predicate  when  all  arguments  are  completely  instantiated  (i.e.  there  are  no 
variable  terms  in  any  argument).  A predicate  is,  then,  a logical  expression  that  may  be  included  in  pre- 
and  post-conditions.  When  the  procedure  is  called  as  a predicate,  it  either  succeeds  or  fails.  Consider 
again  set_rncmber.  If  the  first  argument  is  a set  and  the  second  argument  is  an  element,  set_member  will 
succeed  only  if  the  element  is  contained  in  the  set.  For  example,  the  query 


lft 


set_member(  set(list([int(3),int(4),int(5)])),  int(3) ) ? 


will  succeed,  while  the  query 


set_member(  set(list([int(3),int(4),int(5)])),  int(l) ) ? 


will  fail. 

To  document  the  use  of  a Prolog  procedure,  an  annotation  of  its  parameters  is  used.  In  the  synopsis 
of  the  manual  page  entry  for  a library,  each  argument  of  a procedure  is  classified  as  an  input  parameter 
-hinput  , an  output  parameter  -output",  a generated  output  "-generated",  or  a template  output  "— 
template  . An  input  parameter  is  a completely  instantiated  Prolog  term.  An  output  parameter  is  a vari- 
able, and  the  procedure  will  unify  it  with  a value  which  is  a completely  instantiated  Prolog  term.  A gen- 
erated output  is  a variable  that  will  be  unified  with  all  possible  values  on  backtracking  (see  the  set^member 
example,  above). 

For  example, 

foo(  -hinput,  +input,  -output) 
foo(  -hinput,  -generated,  -generated) 
foo(  -hinput,  -hinput,  -hinput) 

tells  us  that  the  procedure  "foo"  can  be  used  in  any  of  three  ways.  First,  if  the  first  two  arguments  are 
inputs,  the  third  argument  will  receive  an  output  value.  This  is  an  example  of  using  a procedure  as  a func- 
tion. If  only  the  first  argument  has  a value,  "foo"  will  generate  values  in  the  second  and  third  arguments. 
"Foo"  can  also  be  used  as  a predicate;  if  all  three  arguments  contain  input  values,  "foo"  will  either  succeed 
or  fail. 

In  the  future,  we  would  like  to  investigate  the  use  of  templates  as  parameters.  A template  output  is 
a variable  that  will  be  unified  with  a partially  instantiated  Prolog  term.  A good  example  of  template  out- 
put and  its  usefulness  is  the  combined  use  of  the  listed  and  listjtl  procedures  from  the  list  library: 
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list  Jid(-template,+ input) 
listjl(-template,+input). 

If  the  second  argument  of  the  listjid  procedure  is  a completely  instantiated  Prolog  term  and  the  first  argu- 
ment is  a variable,  the  first  argument  will  be  unified  with  a list  template,  a PLEASE  list  with  a variable 
tail  and  the  second  argument  of  the  listjid  procedure  as  the  head.  If  the  second  argument  of  the  listjl 
procedure  is  a PLEASE  list  and  the  first  argument  is  a variable,  the  first  argument  will  become  a PLEASE 
list  with  an  uninstantiated  head.  We  can  use  these  together  to  create  a new  list. 

For  example,  the  query 

liatjid(  NewList,  int(5) ), 

list_tl(  NewList,  list(  [int(6),  int(7),  int(8)]  ) ) ? 

will  produce  the  output 

NewList=list([int(5),int(6),int(7),int(8)]) 

If  the  second  arguments  to  listjid  and  listjl  are  instantiated  with  the  head  and  tail  of  a list,  respectively, 
and  the  first  argument  of  each  procedure  is  the  same  Prolog  variable,  the  variable  will  be  unified  with  a 
new  list. 

4.8.  File  Input/Output  Library 

The  Prolog  file  input/output  interface  is  quite  different  from  that  provided  in  conventional 
languages.  In  order  to  provide  a more  conventional  interface,  a suite  of  Prolog  procedures  simulating  the 
Pascal  input/output  model  is  provided  for  use  in  PLEASE  prototypes.  The  fileio  library  provides  pro- 
cedures for  opening,  closing,  reading,  and  writing  files. 

A reset  operation  opens  a file  for  reading.  The  file  is  then  read  from  the  start.  A rewrite  operation 
opens  a file  for  writing.  The  file  is  written  from  the  start.  There  is  no  way  to  append  output  to  the  end  of 
a file.  The  Prolog  interpreter  currently  in  use  restricts  the  number  of  files  open  for  input  and/or  output  at 
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any  one  time  to  15. 


The  restriction  that  a reset  or  rewrite  must  be  performed  before  reading  or  writing  a file  is  enforced 
in  the  following  manner.  A reset  on  Filename  causes  a clause  "open_jead(Filename)M  to  be  asserted  in  the 
Prolog  data  base.  Whenever  a rewrite  is  performed,  "open_write (Filename)”  is  asserted.  If  the  user 
attempts  to  read  a file  with  no  open^read  clause  or  write  a file  with  no  openjwritt  clause,  an  error  message 
is  printed,  and  the  procedure  returns  an  error.  The  cost  of  this  error  checking  is  one  assertion  for  each 
open  operation  and  one  unification  for  each  read  or  write  operation. 

The  Prolog  procedure  fileio^eval  allows  the  read  (filtio^rtad  and  fileiojrcadln)  and  write  (filcio_writc 
and  fileio-wrxteln)  procedures  to  be  called  with  a variable  number  of  arguments.  All  calls  to  fileio^read, 
fUtiojrtadln,  and  fileto^writeln  must  be  included  in  filexo_eval  as  shown  in  the  synopsis  of  the 

manual  entry  for  fileiojib. 

Filtio^rtad  and  filtio^rtadln  read  Prolog  terms  from  the  specified  file.  Each  argument  will  receive 
one  Prolog  term  as  a return  value.  If  there  are  any  terms  remaining  on  a line  after  fUcio^jcQdln  has  unified 
its  arguments,  they  will  be  ignored.  If  the  end  of  file  is  reached,  every  remaining  argument  will  be  unified 
with  the  atom  ’end^pfJHe’.  If  the  file  being  read  is  not  terminated  with  a newline,  these  procedures  will 
hang.  Remember  that  filtio^read  and  filciojrc&dln  must  be  called  within  JUe\o_ev<il. 

Various  errors  are  detected  by  the  fileio  library  at  run-time.  Each  routine  in  fileiojib  has  an  error 
return  code.  When  an  error  is  detected,  a message  is  printed  on  standard  error  and  the  error  variable  is 
unified  with  the  name  of  the  routine  in  which  the  error  occurred.  If  no  error  occurs,  the  error  code  will  be 
set  to  the  Prolog  atom  ’false’. 

4.4.  Miscellaneous  Tools 

A useful  debugging  environment  is  also  being  developed  for  the  PLEASE  system.  A set  of  procedures 
for  manipulating  global  variables  has  been  developed.  These  global  procedures  are  extremely  useful  in 
debugging  and  may  be  used  to  implement  a type  of  call-by-reference  parameter  passing.  A global  variable 
is  a Prolog  term  with  the  global  variable  name  (assigned  by  the  get_jlobal  routine)  as  the  principal  functor 
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and  the  value  as  the  only  argument  in  the  term.  For  example,  a global  variable  containing  a single 
integer,  3,  might  be: 

globalO(  int(3)  ). 

Values  can  be  assigned  to  global  variables  and  obtained  from  global  variables  using  operations  defined  in 
the  mscjib.  A procedure  to  dispose  of  a global  variable  is  also  provided. 

Global  variables  are  useful  when  debugging  prototypes.  Long  instances  of  data  types  are  tedious  to 
type.  It  may  be  easier  to  assign  an  instance  of  a data  type  to  a global  variable  and  then  dereference  the 
global  variable  when  its  value  is  needed. 

Global  variables  can  also  be  used  to  implement  call-by-reference  in  PLEASE  prototypes.  An  argu- 
ment to  a procedure  may  be  the  name  of  a global  variable.  The  variable  may  be  dereferenced  to  obtain  its 
contents.  The  new  value  may  be  assigned  to  the  global  variable  before  the  procedure  returns.  This  is  also 
useful  in  reducing  the  traffic  in  procedure  calls  made  from  implementation  language  modules  through  the 
pipes.  Instead  of  passing  the  entire  value  of  a variable,  the  prototype  procedure  could  be  coded  to  operate 
on  call-by-reference.  Only  the  name  of  the  variable  is  passed  to  the  procedure.  It  is  then  dereferenced, 
modified,  and  stored  back  in  the  global  variable. 
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6.  SUMMARY  AND  CONCLUSIONS 


PLEASE  is  a programming  language  which  supports  a methodology  similar  to  the  Vienna  Develop- 
ment Method.  PLEASE  procedures  may  be  specified  with  pre-  and  post-conditions  written  in  predicate 
logic.  User-defined  abstract  data  types  called  objects  may  have  data  type  invariants.  PLEASE 
specifications  may  be  transformed  into  executable  prototypes  written  in  Prolog.  These  prototypes  are  use- 
ful in  helping  the  developers  deliver  a system  that  satisfies  the  customers  desires.  The  specifications  may 
also  be  used  with  formal  verification  techniques  to  show  that  an  implementation  meets  the  requirements  of 
the  specification. 

The  PLEASE  data  types  list,  set,  and  map  are  conveniently  represented  in  Prolog.  Libraries  of 
standard  operations  on  these  data  types  have  been  developed.  A run-time  architecture  has  been  developed 
which  allows  Prolog  procedures  to  be  executed  from  standard  implementation  language  modules.  A 
library  of  procedures  which  simulate  the  standard  Pascal  input/output  model  was  defined  in  order  to  pro- 
vide a more  conventional  i/o  interface  for  PLEASE.  A set  of  procedures  to  manipulate  global  variables 
was  provided  to  facilitate  debugging  of  prototypes.  These  procedures  are  also  useful  in  implementing  a 
form  of  call-by-reference  in  PLEASE  prototypes. 

The  list  is  the  principal  data  structuring  device  of  Prolog.  PLEASE  lists  were  easily  represented  in 
terms  of  Prolog  lists.  PLEASE  sets  and  maps  were  then  defined  in  terms  of  PLEASE  lists.  The  operations 
on  sets  and  maps  were  implemented  in  terms  of  the  operations  on  lists.  There  is  a great  deal  of  room  for 
improving  the  efficiency  of  the  library  of  list  operations.  Since  the  other  operations  were  defined  with  the 
library  of  list  operations,  improvements  in  the  efficiency  of  list  operations  will  also  improve  the  efficiency 
of  operations  on  sets  and  maps. 

At  present,  instances  of  PLEASE  data  types  contain  structural  type  information.  However,  the 
operations  on  the  data  type  representations  are  not  type-iafe.  For  example,  the  function  li»t_empty  may 
be  used  to  create  a list  with  no  elements  in  this  situation.  It  is  not  possible  to  determine  if  the  list  is  to  be 
a list  of  integers,  a list  of  characters,  or  a list  of  some  compound  data  type.  Schemes  for  run-time  type 
checking  were  investigated,  but  we  concluded  that  the  overhead  needed  to  provide  this  facility  was  too 
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great.  We  also  investigated  the  run-time  checking  of  name  compatibility  between  types.  This  idea  proved 
to  be  extremely  difficult. 

The  libraries  developed  for  this  thesis  are  a major  step  in  the  implementation  of  the  PLEASE  pro- 
gramming language.  PLEASE  should  provide  an  interesting  vehicle  for  the  study  of  top-down  develop- 
ment methodologies  by  the  SAGA  group.  We  feel  these  methodologies  will  enhance  the  software  develop- 
ment process. 
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NAME 

array  Jib  - a Prolog  library  of  array  routines  for  PLEASE 

SYNOPSIS 

Array  representation: 

array(int(Lowerbound),int(Upperbound),list([Element,...,Element])) 

array  _size(+ input, + input) 
array  _size(+input, -output) 
array  _size(array(...),int(Length)) 

array  _member(+input,+input) 
array  _member(+input, -generated) 
array_member(Element,array(...)) 

array  _equal(+input,+ input) 
array  _equal(+input, -output) 
array  _equal(array(...),array(...)) 

array  Jndex(+input,-H  input, +input) 
array  Jndex(+input,-Hnput, -output) 
array  Jndex(array(...),int(Index), Element) 

array  _pverwrite(+input,+input,-H  input, -output) 
array  _overwrite(array(.,.),int(Index),  Element,  array(...)) 

DESCRIPTION 

Array  Jib  provides  a Prolog  library  of  predicates  and  functions  to  operate  on  arrays. 

Array  Jib  is  a library  of  array  routines  for  the  PLEASE  system  (see  pltast_intro{\)).  PLEASE  is 
an  executable  specification  language.  It  is  an  extension  of  Path  Pascal  and  supports  the  Vienna 
Development  Method.  In  the  PLEASE  system,  programs  are  specified  using  pre-  and  post- 
conditions written  in  predicate  logic.  These  pre-  and  post-conditions  are  transformed  into  Prolog 
and  executed  by  the  UNSW  Prolog  interpreter.  Calls  to  array  Jib  functions  may  be  included  in 
these  prototypes. 

Array  Jib  is  written  in  Prolog  (see  Programming  in  Prolog  by  Clocksin  and  Mellish).  An  array  is 
represented  as  a Prolog  term  of  the  form: 

array(int(LowerBound),int(Upperbound),list([Element,...,Element])) 

where  each  Element  is  a Prolog  term  with  data  type  information  and  a value.  See  lib_intro(3)  for 
more  information  about  the  Prolog  representation  of  PLEASE  data  types  and  the  libraries  of  Pro- 
log functions  to  operate  on  those  data  types. 

The  array  library  provides  a predicate  for  determining  if  an  element  is  present  in  a list.  The  array 
library  provides  functions  to  determine  the  siie  of  an  array,  the  contents  of  one  of  the  positions  of 
the  array,  and  to  overwrite  the  contents  of  one  of  the  positions  of  the  array. 

Array  .size  takes  an  array  as  its  first  argument  and  returns  an  integer  (int(Value))  whose  value  is 
the  size  of  the  array.  If  its  second  argument  is  instantiated  to  an  integer,  the  function  will  act  as  a 
predicate  to  determine  if  the  array  is  of  the  given  size. 

Array  .member  takes  an  array  as  its  first  argument  and  generates  the  members  of  the  array  in  the 
second  argument.  If  the  second  argument  is  not  a variable,  array_member  is  a predicate  that 
succeeds  if  the  element  is  in  the  array. 
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Array _equal  is  a predicate  that  determines  if  two  arrays  are  equal.  Two  arrays  are  equal  if 
corresponding  elements  are  equal.  If  the  second  argument  to  array _equal  is  a variable,  it  will  be 
unified  with  the  array  given  in  the  first  argument. 

Array Jndex  takes  an  array  as  its  first  argument  and  an  integer  index  as  its  second  argument  and 
returns  the  element  at  that  position  in  its  third  argument.  If  the  third  argument  is  not  a variable, 
array  Jndex  is  a predicate  that  succeeds  if  the  element  is  at  that  position  in  the  array. 

Array  ^overwrite  takes  an  array  as  its  first  argument,  an  integer  index  as  its  second  argument,  a 
new  element  as  its  third  argument,  and  returns  the  first  array  with  the  new  element  substituted  at 
position  index  in  the  third  argument. 

SEE  ALSO 

lib_intro(3),  pltast_intro(  1),  cncompa88_intro(l),  Programming  in  Prolog  by  Clocksin  and  Mellish 
AUTHOR 

Philip  R.  Roberts,  Robert  B.  Terwilliger,  Department  of  Computer  Science,  University  of  Illinois, 
252  Digital  Computer  Laboratory,  1304  West  Springfield  Avenue,  Urbana,  IL  61801. 
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NAME 

c_to_plg  - functions  to  enable  execution  of  Prolog  commands  from  C 

SYNOPSIS 

#include  "cjto_plg.h" 

P JJUF  inbuf,  outbuf  ; 

void  c_to._plg_calI(inbuf,  outbuf) 

P _DUF  *inbuf ; 

PJ3UF  *outbuf  ; 

void  c_to_plg_debug(debug) 

int  debug  ; /*  constant  ON  or  OFF  */ 

DESCRIPTION 

The  cjto_plg  functions  provide  a means  for  C programs  to  execute  Prolog  clauses. 

The  cjto_plg  layer  is  one  layer  in  the  PLEASE  system  (see  plca8c_intro(  1)).  PLEASE  is  an  execut- 
able specification  language.  It  is  an  extension  of  Path  Pascal  and  supports  the  Vienna  Develop- 
ment Method.  In  the  PLEASE  system,  programs  are  specified  using  pre-  and  post-conditions 
written  in  predicate  logic.  These  pre-  and  post-conditions  are  transformed  into  Prolog  and  exe- 
cuted by  the  UNSW  Prolog  interpreter. 

Cjto_plg_j:all  is  a text  interface  from  C to  Prolog.  C and  Prolog  communicate  through  strings. 
All  strings  must  be  terminated  with  a *\0\  C programs  can  send  commands  to  Prolog  to  be  exe- 
cuted by  using  c_to_plg_call.  The  command  is  placed  in  the  inbuf.  The  results  of  the  command 
executed  by  Prolog  are  returned  in  the  outbuf.  PJ3UF  contains  a 4K— byte  character  string. 

The  c_to_plg_debug  function  turns  debugging  on  or  off  for  the  c_to_plg  layer.  If  debug  is  set  to 
ON,  a constant  defined  in  the  header  file,  the  debugging  is  turned  on  for  c_to^>lg.  If  value  is  set  to 
OFF,  also  a constant  define  in  the  header  file,  it  is  turned  off. 

FILES 

${ENCOMPASS}/include/ c_to_plg.h 
${ENCOMPASS} /lib/ c Jto_plg.o 

SEE  ALSO 

plea8c_intro(  1),  encompa*«_tnfro(l) 

AUTHOR 

Philip  R.  Roberts,  Robert  B.  Terwilliger,  Department  of  Computer  Science,  University  of  Illinois, 
252  Digital  Computer  Laboratory,  1304  West  Springfield  Avenue,  Urbana,  IL  61801. 
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NAME 

fileiojib  - a library  of  Prolog  routines  to  simulate  the  Pascal  I/O  interface  in  the  PLEASE  system 
SYNOPSIS 

fileio_jeset(-|-input, -output) 
fileio  jeset  (Filename,  Error) 

fileio  _jewrite(+input, -output) 
fileio_jewrite(Filename,  Error) 

fileio_eval(fileio_write(+input,+input,+input,...,-!-input), -output) 
fileio_eval(fileio_write(Filename,Argl,Arg2,...,ArgN), Error) 

fileiojeval(fileiojwriteln(+input,+input,+input,...,+input), -output) 
fileio_eval(fileio_writeln(Filename,Argl,Arg2,...,ArgN), Error) 

fileiojeval(fileiojread(+input,-output, -output, ...,-output),-output) 
fileio_eval(fileio_read(Filename,Argl,Arg2,...,ArgN), Error) 

fileio_eval(fileio_readln(-}-input, -output, -output,.,., -output), -output) 
fileio  _eval(fileio_jeadln(Filename,Argl,Arg2,...,ArgN),  Error) 

DESCRIPTION 

Fileiojib  provides  a Prolog  I/O  library  similar  to  that  provided  by  Pascal. 

Fileiojib  is  a library  of  I/O  routines  for  the  PLEASE  system  (see  plea$eJntro(  1)).  PLEASE  is  an 
executable  specification  language.  It  is  an  extension  of  Path  Pascal  and  supports  the  Vienna 
Development  Method.  In  the  PLEASE  system,  programs  are  specified  using  pre-  and  post- 
conditions written  in  predicate  logic.  These  pre-  and  post-conditions  are  transformed  into  Prolog 
and  executed  by  the  UNSW  Prolog  interpreter.  Calls  to  fileiojib  functions  may  be  included  in 
these  prototypes. 

Fileiojib  is  written  in  Prolog  (see  Programming  in  Prolog  by  Clocksin  and  Mellish).  It  provides 
functions  for  opening,  closing,  reading,  and  writing  files.  The  I/O  routines  are  based  on  the  Pascal 
I/O  model. 

In  the  fileiojib,  all  parameters  are  input  parameters  except  Argl  through  ArgN  of  fileiojread  and 
Error  of  all  functions.  Output  parameters  must  be  Prolog  variables  (ie.  first  letter  is  capitalized). 
Filenames  are  UNIX  filenames.  All  filenames  and  literal  output  should  be  enclosed  in  quotes. 

Fileiojreset  opens  a file  for  reading.  Fileio_reset  must  be  called  before  a read  can  be  performed  on 
the  file.  Reading  for  the  newly  opened  file  begins  at  the  start  of  the  file. 

Fileiojewrite  opens  a file  for  writing.  Fileio_rewrite  must  be  called  before  a write  can  be  per- 
formed on  the  file.  If  the  file  already  exists,  its  contents  will  be  cleared  and  writing  will  begin  at 
the  start  of  the  file. 

Fileiojaval  allows  the  read  (fileiojead  and  fileiojeadln)  and  write  (fileio_write  and  fileio_writeln) 
functions  to  be  called  with  a variable  number  of  arguments.  All  calls  to  fileio_read,  fileioj-eadln, 
fileio_write,  and  fileio_writeln  must  be  included  in  fileio_eval  as  shown  in  the  SYNOPSIS. 

Fileio jead  and  fileiojeadln  read  Prolog  terms  from  the  specified  file.  Each  argument  will  receive 
one  Prolog  term  as  a return  value.  If  there  are  any  terms  remaining  on  a line  after  fileiojeadln 
has  unified  its  arguments,  they  will  be  ignored.  If  the  end  of  file  is  reached,  every  remaining  argu- 
ment will  be  unified  with  the  atom  ’end_jofJile\  If  the  file  being  read  is  not  terminated  with  a 
newline,  these  function  will  hang.  Remember  that  fileio^read  and  fileio jeadln  must  be  called 
within  fileio_eval. 
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Fileio_write  and  fileio.writeln  write  their  arguments  to  Filename.  Fileio_writeln  terminates  its 
output  with  a newline  whereas  fileio_write  does  not.  Remember  that  fileio_write  and  fileio_writeln 
must  be  called  within  fileio_eval. 

Each  routine  in  fileiojib  has  an  error  return  code.  If  no  error  occurs,  this  will  be  set  to  the  Prolog 
atom  ’false’.  If  an  error  occurs,  the  name  of  the  routine  where  the  error  occurred  will  be  returned. 


DIAGNOSTICS  . , 

Various  errors  are  detected  by  the  fileio  library  at  runtitae.  When  an  error  is  detected,  a message 
is  printed  on  standard  error  and  the  Error  variable  is  unified  with  the  name  of  the  routine  in 
which  the  error  occurred. 


SEE  ALSO 

pltasejintro(l)f  Programming  in  Prolog  by  Clocksin  and  Mellish 


AUTHOR  . TT  . . . 

Philip  R.  Roberts,  Robert  B.  Terwilliger,  Department  of  Computer  Science,  University  of  Illinois, 

252  Digital  Computer  Laboratory,  1304  West  Springfield  Avenue,  Urbana,  IL  61801. 


There  are  some  constraints  on  the  I/O  library  because  it  is  coded  in  Prolog.  There  can  be  at  most 
15  files  open  simultaneously.  Filenames  are  UNIX  filenames  and  must  be  enclosed  in  single  quotes. 
There  are  some  special  restrictions  on  the  fileio_read  function,  fileiojead  reads  Prolog  terms. 
When  the  end  of  a file  is  reached,  the  value  of  every  argument  read  after  the  end  of  file  will  be 
’endjof JEile*.  If  a file  is  not  terminated  by  a new  line,  fileio_read  will  not  detect  end  of  file  but  will 
hang  instead. 
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NAME 

libjntro  - a set  of  libraries  of  Prolog  functions  to  provide  an  I/O  interface  for  PLEASE  and  to 
implement  PLEASE  data  types 

SYNOPSIS 

List  representation: 

list([Element,... ,Element]) 

Array  representation: 

array(int(Lowerbound),int(Upperbound),list([Element,... , Element])) 

Standard  set  representation: 
set(list([Element,... 'Element])) 

Concise  set  representation: 

setc(LowElement,HighElement,NextFunction,EqualFunction) 

NextFunction=FnName(_,_) 

EqualFunction==FnName(_,_) 

Standard  map  representation: 

map(list([pair(DomainElement'RangeElement)'...' 

pair(DomainElement'RangeElement)])) 

Concise  map  representation: 

mapc(LowElement'HighElement'NextFunction'EqualFunction'MapFunction) 

NextFunction=FnName(_,_) 

EqualFunction=FnName(_,_) 

MapF  unction=F  nName(^  _) 

Function  description: 

function_jiame(  -hinput,  -output,  -generated) 
function_name(  argtype,  argtype,  argtype) 


DESCRIPTION 

Libjntro  describes  a set  of  Prolog  libraries  of  predicates  and  functions  to  define  and  operate  on 
PLEASE  data  types;  and  to  provide  a Pascal-like  I/O  interface  for  PLEASE. 

PLEASE  is  an  executable  specification  language.  It  is  an  extension  of  Path  Pascal  and  supports 
the  Vienna  Development  Method.  In  the  PLEASE  system,  programs  are  specified  using  pre-  and 
post-conditions  written  in  predicate  logic.  These  pre-  and  post-conditions  are  transformed  into 
Prolog  and  executed  by  the  UNSW  Prolog  interpreter.  Calls  to  library  functions  may  be  included 
in  these  prototypes. 

List  Jib  (see  lutJib(Z))  is  a library  of  list  routines  for  the  PLEASE  system  (see  plea*eJntro(  1)). 
Array  Jib  (see  arrayJib(Z))  is  a library  of  array  routines  for  the  PLEASE  system.  Setjib  (see 
setJib(Z))  is  a library  of  set  routines  for  the  PLEASE  system.  Map  Jib  (see  mapJib(Z))  is  a library 
of  map  routines  for  the  PLEASE  system.  Fileiojib  (see  filcioJtb{Z))  is  a library  of  file 
input/output  operations  for  the  PLEASE  system. 

PLEASE  data  types  are  represented  in  Prolog  as  < type > (Value).  For  example,  the  integer  ”3" 
would  be  represented  as  int(3). 
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A list  is  represented  as  a Prolog  term  of  the  form: 
list  ( [Element, ..., Element] ) 

where  each  Element  is  a PLEASE  data  type,  i.e.  a Prolog  term  with  type  information  and  a value, 
< type  > (Value). 

For  example,  after  the  following  PLEASE  code  fragment  has  been  executed: 

type  integer  Jist  — list  of  integer  ; 

variable  i : integerjist  ; 

begin 

i :=  < 1 ,2  > ; 
i would  be  represented  as: 
list([int(l),int(2)]). 

An  array  is  represented  as  a Prolog  term  of  the  form: 

array(int(Upperbound),int(Lowerbound),list([Element,...,Element])) 

where  Upperbound  is  the  highest  index  in  the  array  and  Lowerbound  is  the  lowest.  Each  Element 
is  a PLEASE  data  type,  i.e.  a Prolog  term  with  type  information  and  a value,  < type  > (Value). 
Arrays  are  single  dimensioned,  but  an  array  of  arrays  could  be  constructed. 

There  are  two  representations  for  sets.  A set  can  be  described  with  the  standard  set  notation  or 
with  the  concise  notation.  All  set  operations  are  performed  on  sets  in  the  standard  notation.  The 
concise  notation  is  useful  for  describing  large  sets  (i.e.,  the  set  of  integers  from  1 to  100).  The  set 
library  has  a function  called  setjtranaform  that  transforms  a set  in  the  concise  notation  to  a stan- 
dard set  representation. 

A standard  set  is  represented  as  a Prolog  term  of  the  form: 
set(list([Element,...,Element])) 

where  each  Element  is  a PLEASE  data  type,  i.e.  a Prolog  term  with  type  information  and  a value, 
< type  > (Value). 

For  example,  after  the  following  PLEASE  code  fragment  has  been  executed: 

type  integer_set  = set  of  integer  ; 

variable  s : integer_set  ; 

begin 

s :=  setjunion({l},{3,4})  ; 
s might  be  represented  as: 

set(list([int(3),int(l),int(4)]). 

In  sets,  the  order  of  occurrence  of  elements  is  not  important.  The  sets  are  not  multisets;  sets  only 
contain  one  occurrence  of  each  element. 
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To  describe  a set  in  the  concise  notation,  the  user  must  provide  a low  element,  a high  element,  a 
function  that  produces  the  successor  to  a set  element,  and  a function  that  determines  when  two 
elements  are  equal.  For  example,  the  following  Prolog  code  describes  the  set  of  integers  from  1 to 
100: 

nextJnt(int(X),int(XPlusl))  XPlusl  is  X+l. 
equal  Jnt(int(X),int(X)). 

Set=setc(int(l),int(100),nextJnt(_,„),equaUnt(_,)). 

Calling  sctjtransform  with  Set  as  the  first  argument  will  produce  the  set  of  integers  from  1 to  100 
in  the  standard  representation. 

A map  is  represented  as  a Prolog  term  of  the  form: 

map(list([pair(DomainElement,RangeElement),..., 

pair(DomainElement,RangeElement)])). 

Each  domain  and  range  elements  are  PLEASE  data  type,  i.e.  a Prolog  term  with  type  information 
and  a value,  < type  > (Value). 

For  example,  after  the  following  PLEASE  code  fragment  has  been  executed: 

type  sou arpu  man  = m«n  frnm  Integer  to  integer  J 
domain_set  = set  of  integer  ; 
variable  s : squares_map  ; 
d : domain_set  ; 

function  square(  x : integer  ) : integer 
begin 

square  :=  x*x 
end  ; 
begin 

d :=  {1,2, 3, 4, 5}  ; 
s :=  map_construct(d, square)  ; 

s might  be  represented  as: 

map(list([pair(int(l),int(l)),pair(int(2),int(4)), 

pair(int(3),int(9)),pair(int(4),int(10)), 

pair(int(5),int(25))])). 

To  describe  a map  in  the  concise  notation,  the  user  must  provide  a low  element  for  the  domain  set, 
a high  element  for  the  domain  set,  a function  that  produces  the  successor  to  a set  element,  a func- 
tion that  determines  when  two  elements  are  equal,  and  a function  that  takes  a domain  element 
and  returns  the  corresponding  range  element  (the  map  function).  The  following  Prolog  code 
describes  the  mapping  from  the  set  of  integers  from  1 to  100  to  their  squares: 

nextJnt(int(X),int(XPlusl))  XPlusl  is  X+l. 
equal  Jnt(int(X),int(X)). 

squareJnt(int(X),int(XSquared)) XSquared  is  X*X. 

Map=mapc(int(l),  int(100),  next  jnt  (_,_),  equal  Jnt(^,_),squareJnt(_,_). 


33 


lib  Jntro  ( 3 ) 


UNIX  Programmer’s  Manual 


lib  Jntro  ( 3 ) 


Calling  mapjtransform  with  Map  as  the  first  argument  would  generate  the  standard  representation 
for  the  map  from  the  set  of  integers  from  1 to  100  to  their  squares.  There  is  also  a function 
map^construct  that  takes  a set  in  the  standard  notation  and  the  map  function  and  generates  the 
map  with  that  set  as  the  domain  (see  mapjib(3)). 

Each  function  library  has  its  own  manual  entry.  Each  function  in  the  library  is  described  briefly 
in  the  synopsis.  The  first  few  lines  of  the  synopsis  description  contain  information  about  how  the 
arguments  are  to  be  used.  For  each  argument,  H-input,  -output,  -template,  or  -generated, 
describes  how  the  argument  can  be  used.  An  argument  that  is  marked  "+input"  should  be  a com- 
pletely instantiated  Prolog  term.  In  other  words,  the  term  should  have  no  variables  or  underbars. 
Arguments  marked  "-output”  and  "-generated”  should  be  Prolog  variables.  A "-template”  argu- 
ment returns  a partially  instantiated  Prolog  term.  The  most  useful  instances  of  -template  argu- 
ments are  the  listjid  and  list_tl  functions  that  can  be  used  together  to  create  a new  list  given  a 
head  and  a tail  (see  listjib( 3)).  In  many  functions,  the  arguments  can  be  used  in  various  combina- 
tions of  input,  output,  and  generated.  Functions  have  varying  numbers  of  arguments.  For  exam- 
ple, 

foo(  +input,  H-input,  -output) 
foo(  -hinput,  -generated,  -generated) 
foo(  -hinput,  H-input,  -hinput) 

tells  us  that  the  function  "foo”  can  be  used  in  any  of  three  ways.  First,  if  the  first  two  arguments 
are  inputs,  the  third  argument  will  receive  an  output  value.  If  only  the  first  argument  has  a value, 
”foo”  will  generate  (see  generators , below)  values  in  the  second  and  third  arguments.  "Foo”  can 
also  be  used  as  a predicate.  That  is,  if  all  three  arguments  contain  input  values,  "foo”  will  either 
evaluate  to  true  or  false. 

Some  library  functions  can  be  used  as  generators,  A generator  takes  one  or  more  non-variable 
argument(s)  and  successively  unifies  the  other  argument(s)  with  all  possible  values  that  will  satisfy 
the  conditions.  For  example,  setjnember  may  be  used  as  a generator.  Set jnember,  when  used  as 
a generator,  takes  a set  as  the  first  argument  and  a variable  as  the  second  argument.  The  second 
argument  will  be  successively  unified  with  each  element  of  the  set  during  backtracking.  For  exam- 
ple: 

set_member(set(list([int(l),int(7),int(4),int(5)]))JC)  ? 

X=int(l) 

X=int(7) 

X=int(4) 

X=int(5). 

Some  functions  can  also  be  used  as  predicates,  A function  is  used  as  a predicate  when  none  of  the 
arguments  are  variables.  When  the  function  is  called  it  either  succeeds  or  fails.  Consider  again 
set_member.  If  the  first  argument  is  a set  and  the  second  argument  is  an  element,  set jnember  will 
succeed  if  the  element  is  contained  in  the  set.  If  the  element  is  not  contained  in  the  set, 
set_member  will  fail.  For  example: 

setjaember(set(list([int(3),int(4),int(5)])),int(3))  ? 

**  yes. 

setjnember(set(list([])),int(3))  ? 

**  no. 
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SEE  ALSO 

pleasejintro(  1),  encompa8S_intro(  1),  arrayjib( 3),  listjib( 3),  sctjib(3),  mapjib(3 ),  filciojibfi),  pro- 
log(  1),  Programming  in  Prolog  by  Clocksin  and  Mellish 

AUTHOR 

Philip  R.  Roberts,  Robert  B.  Terwilliger,  Department  of  Computer  Science,  University  of  Illinois, 
252  Digital  Computer  Laboratory,  1304  West  Springfield  Avenue,  Urbana,  IL  61801. 
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NAME 

listjib  - a Prolog  library  of  list  routines  for  PLEASE 

SYNOPSIS 

List  representation: 

list  ( [Element , . . . , Elemen  t ] ) 

listjen(+input,+input) 
list  Jen(+input, -output) 
listJen(list([...]),int(Length)) 

list_equal(-f  input, -hinput) 
list_equal(-hinput, -output) 
list_equal(list([ — ]),list([ — ])) 

list  _jnember(+input,-h input) 
list_jnember(-h  input, -generated) 
list  jnember(list([...]), Element) 

listjid(+input,-hinput) 
list  Jid(-h  input, -output) 
listjid(-template,-l-input) 
listjid  (list  ([...]),  Head) 

list_tl(+input,-h  input) 
list_tl(+input, -output) 
listjl(-template,-l-input) 
list  Jtl(list([...]),‘ Tail) 

list  Jndex(-hinput,-h  input,-!- input) 
list  Jndex(+ input, -hinput, -output) 
list  Jndex(-hinput, -output,-!- input) 
listJndex(list([...]),int(Position),  Element) 

list_overwrite(-hinput,-h  input, -hinput, -output) 
list_overwrite(list([...]),int(Position),  Element,  list([...])) 

list_empty(-h input) 
listjempty(-output) 
list_empty  (list  ([...] )) 

list_append(-h  input, -hinput, -output) 

list_append(list([...]),list([...]),list([...])) 

list  Jntersect(-hinput, -hinput, -output) 
listjntersect(list([...]),list([...]),list([...])) 

list_difference(-h  input, -hinput, -output) 
list_difference(list([...]),list([...]),list([...])) 

list_union(-h  input, -hinput, -output) 
list_Union(list([...]),list([...]),list([...])) 
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DESCRIPTION 

List  Jib  provides  a Prolog  library  of  predicates  and  functions  to  operate  on  lists. 

List  Jib  is  a library  of  list  routines  for  the  PLEASE  system  (see  picase_intro(  1)).  PLEASE  is  an 
executable  specification  language.  It  is  an  extension  of  Path  Pascal  and  supports  the  Vienna 
Development  Method.  In  the  PLEASE  system,  programs  are  specified  using  pre-  and  post- 
conditions written  in  predicate  logic.  These  pre-  and  post-conditions  are  transformed  into  Prolog 
and  executed  by  the  UNSW  Prolog  interpreter.  Calls  to  listjib  functions  may  be  included  in  these 
prototypes. 

Listjib  is  written  in  Prolog  (see  Programming  in  Prolog  by  Clocksin  and  Mellish).  A list  is 
represented  as  a Prolog  term  of  the  form: 

list  ( [Element,.  ..,Element] ) 

where  each  Element  is  a Prolog  term  with  data  type  information  and  a value  (i.e., 
< type  > (Value)).  See  lib_intro( 3)  for  more  information  about  the  Prolog  representation  of 
PLEASE  data  types  and  the  libraries  of  Prolog  functions  to  operate  on  those  data  types. 

The  list  library  provides  predicates  for  determining  if  an  element  is  present  in  a list  or  if  a list  is 
empty.  The  list  library  provides  functions  to  determine  the  length  of  a list,  the  head  of  a list  (its 
first  element),  or  the  tail  of  a list  (a  list  containing  all  the  elements  except  the  first).  The  list 
library  also  provides  functions  for  appending  two  lists  to  form  a new  list;  constructing  a list  con- 
taining elements  that  are  in  both  of  two  lists;  constructing  a list  containing  the  elements  that  are 
in  one  list,  but  not  in  another;  and  forming  a list  containing  the  elements  of  two  lists  but  only  one 
occurrence  of  each  (i.e.  merge  two  lists). 

List  Jen  takes  a list  as  its  first  argument  and  returns  an  integer  (in  t (Value))  whose  value  is  the 
length  of  the  list.  If  its  second  argument  is  instantiated  to  an  integer,  the  function  is  a predicate 
that  succeeds  if  the  list  is  of  the  given  length. 

List_equal  is  a predicate  if  both  arguments  are  instantiated.  It  succeeds  if  the  two  lists  are  equal. 
If  the  second  argument  is  a variable,  it  will  be  unified  with  the  first  argument. 

List_member  takes  a list  as  its  first  argument  and  generates  the  members  of  the  list  in  its  second 
argument.  If  its  second  argument  is  not  a variable,  list_member  will  act  as  a predicate,  succeeding 
if  the  element  is  a member  of  the  list. 

Listjid  takes  a list  as  its  first  argument  and  returns  the  first  element  of  the  list  as  its  second  argu- 
ment. If  both  arguments  are  instantiated,  listjid  acts  as  a predicate  which  succeeds  if  the  second 
argument  is  the  head  of  the  list  (the  first  argument). 

List  Jl  takes  a list  as  its  first  argument  and  returns  the  tail  of  that  list  as  its  second  argument  (the 
tail  of  a list  is  the  list  with  its  first  element  removed).  If  both  arguments  are  instantiated,  list  Jl 
acts  as  a predicate  which  is  true  if  the  second  argument  is  the  tail  of  the  list. 

Listjid  and  listjl  can  be  used  together  in  the  following  fashion.  Suppose  we  wanted  to  create  a 
new  list  that  had  X as  its  head  and  Y as  its  tail.  NewList  is  a template  returned  by  each  function. 
By  giving  the  template  the  same  name  in  each  function,  the  Prolog  unification  operation  fills  in 
the  empty  slots  to  produce  a completely  instantiated  list.  We  could  do  this  with  the  following 
calls: 

HstJid(NewList,X),  listJl(NewList,Y). 


List  Jndex  is  a predicate  that  succeeds  if  the  Element  given  in  the  third  argument  is  at  the  position 
given  in  the  second  argument  of  the  list  given  as  the  first  argument.  If  the  third  argument  is  a 
variable,  it  will  be  unified  with  the  element  of  the  list  at  the  position  given  in  the  second 
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argument.  If  the  second  argument  is  a variable,  it  will  be  unified  with  the  position  of  the  first  ele- 
ment in  the  list  equal  to  the  third  argument. 

List_overwrite  creates  a new  list  by  replacing  the  element  of  the  list  at  P osition  (the  second  argu- 
ment) with  the  element  given  as  the  third  argument.  This  new  list  is  returned  as  the  fourth  argu- 
ment. 

List_empty  is  true  if  its  argument  contains  no  elements.  If  its  argument  is  a variable,  list_empty 
will  unify  it  with  an  empty  list. 

List_append  creates  a new  list  (its  third  argument)  by  appending  two  lists  (its  first  two  argu- 
ments). 

ListJntersection  creates  a new  list  (its  third  argument)  which  contains  all  the  elements  that  are 
present  in  both  of  the  lists  passed  in  as  its  first  two  arguments.  Each  element  will  occur  only  once 
in  the  new  list. 

List_difference  creates  a new  list  (its  third  argument)  which  contains  all  the  elements  that  are  in  its 
first  argument  and  that  are  not  in  its  second  argument. 

Listjunion  creates  a new  list  (its  third  argument)  which  contains  all  the  elements  in  the  first  two 
arguments  (lists).  Each  element  occurs  only  once  in  the  new  list. 

SEE  ALSO 

libJntro(Z),  pleaeejntro(l),  eneompassJntro(  1),  Programming  in  Prolog  by  Clocksin  and  Mellish 
AUTHOR 

Philip  R.  Roberts,  Robert  B.  Terwilliger,  Department  of  Computer  Science,  University  of  Illinois, 
252  Digital  Computer  Laboratory,  1304  West  Springfield  Avenue,  Urbana,  IL  61801. 
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NAME 

mapjib  - a Prolog  library  of  map  routines  for  PLEASE 
SYNOPSIS 

Standard  map  representation: 

map(list([pair(DomainElement,RangeElement),..., 

pair(DomainElement,RangeElement)])) 

Concise  map  representation: 

mapc(LowElement,HighElement,NextFunction,EqualF unction, MapFunction) 
NextFunction=FnName(_,_) 

EqualFunction=FnName(_,_) 

MapF  unction=FnName(^,_) 


map_transform(-h  input, -output) 
mapJransform(mapc(...),map(...)) 

map_construct(+input,-l- input, -output) 
map_construct(set(.,.),  MapFunction,  map(...)) 

MapFunction=FnName(_,_) 

map_empty(-h  input) 
map_empty  (-output) 
map_empty(map(...)) 

map_domain(+ input, -output) 
map,jdomain(map(...),set(...)) 

map_range(-h  input, -output) 
map_range(map(...),set(...)) 

map_pverwrite(+input,H- input, -output) 

map_overwrite(map(...),pair(DomainElement,RangeElement),map(...)) 

map_apply(+input,+input,+  input) 
map_apply(+input,+ input, -output) 
map_apply(map(...),DomainElement,RangeElement) 

DESCRIPTION 

Mapjib  provides  a Prolog  library  of  predicates  and  functions  to  operate  on  maps. 

Mapjib  is  a library  of  map  routines  for  the  PLEASE  system  (see  pleaetjintro(  1)).  PLEASE  is  an 
executable  specification  language.  It  is  an  extension  of  Path  Pascal  and  supports  the  Vienna 
Development  Method.  In  the  PLEASE  system,  programs  are  specified  using  pre-  and  post- 
conditions written  in  predicate  logic.  These  pre-  and  post-conditions  are  transformed  into  Prolog 
and  executed  by  the  UNSW  Prolog  interpreter.  Calls  to  mapjib  functions  may  be  included  in 
these  prototypes. 

Mapjib  is  written  in  Prolog  (see  Programming  in  Prolog  by  Clocksin  and  Mellish).  There  are  two 
representations  for  maps,  a standard  representation  and  a concise  representation.  All  map  opera- 
tions are  performed  on  the  standard  representation  of  a map.  The  standard  representation  of  a 
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map  contains  a list  that  enumerates  the  pairs  of  elements  of  the  map.  The  concise  representation 
of  a map  includes  the  low  element  in  the  domain  set,  the  high  element  in  the  domain  set,  the  head 
of  a clause  that  will  produce  the  "next"  element  of  the  domain  set,  the  head  of  a clause  that  will 
determine  if  two  elements  of  the  domain  set  are  ’’equal",  and  the  head  of  a clause  that  will  return 
the  range  element  that  corresponds  to  the  domain  element.  The  concise  representation  provides  a 
means  for  giving  a short  description  of  a large  map  (too  large  to  enumerate).  See  lib_intro(3)  for  a 
description  of  PLEASE  data  types  and  general  information  about  operations  on  those  data  types. 

The  map  library  provides  a predicate  for  determining  if  a map  is  empty.  The  map  library  pro- 
vides functions  for  finding  the  domain  or  range  of  a map,  overwriting  a pair  in  a map,  or  finding 
the  range  element  that  corresponds  to  the  domain  element  of  a map.  All  of  these  operations  work 
on  standard  map  representations.  There  is  a function  that  converts  a concise  representation  into  a 
standard  representation. 

Mapjransform  takes  a concise  map  representation  as  its  first  argument  and  returns  the 
corresponding  standard  map  representation  as  its  second  argument.  It  is  important  to  remember 
that  if  a concise  map  representation  is  given,  the  user  MUST  provide  functions  definitions  for  the 
next  function,  the  equal  function,  and  the  mapping  itself. 

Map.construct  takes  a standard  set  (the  domain  set,  see  setji6(3))  as  its  first  argument,  the  head 
of  a function  that  describes  the  map  as  its  second  argument,  and  returns  a standard  map  as  the 
third  arguement.  The  standard  map  is  constructed  by  applying  the  function  to  each  element  of 
the  standard  set. 

Map_empty  succeeds  if  its  argument  (a  standard  map  representation)  is  empty.  If  its  argument  is 
a variable,  it  will  be  instantiated  to  an  empty  map. 

Map  ^domain  takes  a standard  map  as  its  first  argument  and  returns  a standard  set  that  is  the 
domain  of  the  map.  Map_jange  takes  a standard  map  as  its  first  argument  and  returns  a standard 
set  that  is  the  range  of  the  map. 

Map_overwrite  takes  a standard  map  as  its  first  argument,  a mapping  pair  as  its  second  argument, 
and  returns  a new  map.  If  the  domain  element  exists  in  the  map,  the  range  element  will  be 
replaced  by  the  new  range  element  in  the  mapping  pair.  If  the  domain  element  does  not  exist  in 
the  map,  the  mapping  pair  is  inserted.  If  the  range  element  is  ”J’,  the  mapping  pair  for  the 
domain  element  is  removed  from  the  map. 

Map_apply  takes  a standard  map  as  its  first  argument  and  a domain  element  as  its  second  argu- 
ment and  returns  the  range  element  that  corresponds  to  the  domain  element  as  its  third  argument. 
If  the  third  argument  is  not  a variable,  map_apply  is  a predicate  that  succeeds  if  the  third  argu- 
ment is  the  range  element  that  corresponds  to  the  domain  element. 

SEE  ALSO 

lib_intro( 3),  plca8c_intro{l),  cncompa88jntro(  1),  Programming  in  Prolog  by  Clocksin  and  Mellish 
AUTHOR 

Philip  R.  Roberts,  Robert  B.  Terwilliger,  Department  of  Computer  Science,  University  of  Illinois, 
252  Digital  Computer  Laboratory,  1304  West  Springfield  Avenue,  Urbana,  IL  01801. 
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NAME 

mscjib  - a library  of  miscellaneous  routines  for  PLEASE 
SYNOPSIS 

% Global  variable  manipulation 

get_45lobal(-output) 

get_global(Name) 

allocate_£lobal(+input) 

allocate_global(Name) 

assign_global(+input,+input) 
assign_global(Name, Value) 

value_global(+input, -output) 
value_global(Name, Value) 

remove_£lobal(+ input) 
remove_global(Name) 

% Useful  operations  on  integers 

int_equal(+input,+input) 

intjequal(int(X),int(X)) 

int_next(+input, -output) 
intjiext(int(X),int(Y)) 

int_prev(-i-input, -output) 
int^prev(int(X),int(Y)) 

DESCRIPTION 

Mscjib  provides  a Prolog  library  of  predicates  and  functions  to  perform  various  miscellaneous 
operations. 

Mscjib  is  a library  of  miscellaneous  routines  for  the  PLEASE  system  (see  pleaee_intro(  1)). 
PLEASE  is  an  executable  specification  language.  It  is  an  extension  of  Path  Pascal  and  supports 
the  Vienna  Development  Method.  In  the  PLEASE  system,  programs  are  specified  using  pre-  and 
post-conditions  written  in  predicate  logic.  These  pre-  and  post-conditions  are  transformed  into 
Prolog  and  executed  by  the  UNSW  Prolog  interpreter.  Calls  to  mscjib  functions  may  be  included 
in  these  prototypes. 

Mscjib  is  written  in  Prolog  (see  Programming  in  Prolog  by  Clocksin  and  Mellish).  One  group  of 
miscellaneous  functions  are  used  to  allocate,  manipulate,  and  deallocate  global  variables. 

Global  variables  are  very  useful  in  prototyping  PLEASE  specifications.  The  representation  of  lists 
and  sets  can  sometimes  be  very  long  and  tedious  to  type.  Global  variables  provide  an  easy  way  to 
manipulate  these  large  representations.  Use  get^global(Name)  to  get  a global  name.  Name  will  be 
unified  with  the  name  of  the  global  variable  (globalO,  globall,...).  To  allocate  a global  with  a 
name  of  your  own  choosing,  use  allocate^global(Name).  Use  assign_global  to  assign  a value  to  a 
global  variable.  Suppose  you  wanted  to  assign  the  term 

"functionjiames(push, pop, create, destroy)"  to  a global  variable.  First  type  "get^global(Name)?"  to 
allocate  a global  variable.  Then  type 
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"assign_global(Name,function_jiames(push, pop, create, destroy)"  where  Name  is  the  global  name 
returned  by  the  call  to  get_global.  Use  value_global(Name, Value)  to  find  the  current  value  of  a 
global  variable.  Use  remove_global(Name)  to  deallocate  a global  variable. 

This  library  also  contains  a set  of  useful  operations  on  integers.  Int_equal  succeeds  if  the  two 
integer  arguments  are  equal.  Intjiext  returns  the  successor  of  the  integer  given  as  the  first  argu- 
ment. Int_prev  returns  the  predecessor  of  the  integer  given  as  the  first  argument. 

SEE  ALSO 

pltaseJntro{l\  encompass Jntro(  1),  Programming  in  Prolog  by  Clocksin  and  Mellish 
AUTHOR 

Philip  R.  Roberts,  Robert  B.  Terwilliger,  Department  of  Computer  Science,  University  of  Illinois, 
252  Digital  Computer  Laboratory,  1304  West  Springfield  Avenue,  Urbana,  EL  61801. 
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NAME 

ptoplg  - functions  to  enable  execution  of  Prolog  commands  from  Pascal  or  Path  Pascal 

SYNOPSIS 

#include  "ptoplg.h" 

var  inbuf,  outbuf  : plgbuf  ; 
debug  : integer  ; 

plgbufinit(inbuf)  ; 

plgbufappend(inbuf,\..  S’)  ; 

ptoplgcall(inbuf,  outbuf)  ; 

plgbufwrite(plgbuf)  ; 

ptoplgdebug(debug)  ; /*  debug  is  constant  ON  or  OFF  */ 

DESCRIPTION 

The  ptoplg  functions  provide  a means  for  Pascal  or  Path  Pascal  programs  to  execute  Prolog 
clauses. 

The  ptoplg  layer  is  one  layer  in  the  PLEASE  system  (see  plta*e_intro(l)).  PLEASE  is  an  execut- 
able specification  language.  It  is  an  extension  of  Path  Pascal  and  supports  the  Vienna  Develop- 
ment Method.  In  the  PLEASE  system,  programs  are  specified  using  pre-  and  post-conditions 
written  in  predicate  logic.  These  pre-  and  post-conditions  are  transformed  into  Prolog  and  exe- 
cuted by  the  UNSW  Prolog  interpreter. 

Ptoplg  is  a text  interface  between  Pascal  and  Prolog.  Pascal  programs  can  communicate  to  Prolog 
through  plgbufs  (Prolog  buffers).  An  empty  buffer  can  be  created  by  declaring  it  as  a plgbuf  and 
then  calling  plgbufinit  with  it  as  the  single  argument.  Strings  can  be  appended  to  the  end  of  the 
buf  with  plgbufappend.  It  is  important  to  note  that  the  strings  passed  to  plgbufappend  must  end 
with  a To  clear  a buffer,  call  plgbufinit  with  the  desired  buffer  as  the  argument.  Plgbufwrite 
writes  the  contents  of  the  buffer  on  the  standard  output.  Prolog  commands  can  be  constructed  in 
these  buffers  using  the  plgbufinit  and  plgbufappend  commands  and  then  sent  to  Prolog  using 
ptoplgcall.  The  command  is  placed  in  the  inbuf.  The  results  of  the  command  executed  by  Prolog 
are  returned  in  the  outbuf.  The  plgbufs  are  4K-bytes  in  sise. 

The  ptoplgdebug  function  turns  debugging  on  or  off  for  the  ptoplg  layer.  If  debug  is  set  to  ON,  a 
constant  defined  in  the  header  file,  the  debugging  is  turned  on.  If  debug  is  set  to  OFF,  also  a con- 
tant  defined  in  the  header  file,  it  is  turned  off. 

FILES 

${ENCOMPASS}/include/ptoplg.h 
$ {ENCOMPASS } /lib /ptoplg.o 

SEE  ALSO 

pUa*eJntro(  1),  encompas6_intro(  1),  plcJntro(l) 

AUTHOR 

Philip  R.  Roberts,  Robert  B.  Terwilliger,  Department  of  Computer  Science,  University  of  Illinois, 
252  Digital  Computer  Laboratory,  1304  West  Springfield  Avenue,  Urbana,  IL  61801. 
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NAME 

set  Jib  - a Prolog  library  of  set  routines  for  PLEASE 
SYNOPSIS 

Standard  set  representation: 

set(list([Element,...,Element])) 

Concise  set  representation: 

setc(LowElement,HighElement,NextFunction,EqualFunction) 

NextFunction=F  nName(H_) 

EqualFunction=F  nName(H_) 

set_transform(+input, -output) 
set_transform(setc(...),set(...)) 

set_jnember(+input,+input) 
set  jnember(+input, -generated) 
set  _member(set(...),  Element) 

set_empty(+  input) 
setjemp  ty  (set  ( . . .)) 

set  junion(-h  input, +input, -output) 
set_union(set(...),set(...),set(...)) 

set  Jntersection(-f  input, +input, -output) 
setJntersection(set(...),set(...),set(...)) 

set_difference(+input, -I- input, -output) 
set_difference(set(...),set(...),set(...)) 

set_subset(+input,+input) 

set_subset(set(...),set(...)) 

set_equal(+input,4*input) 

set_equal(set(...),set(...)) 

set  Jnsert_element(+input,+input, -output) 
setJnsert_element(Element,set(...),set(...)) 

set  jemove_element(+input,+input, -output) 
set_remove_element(Element,set(...),set(...)) 

DESCRIPTION 

Set  Jib  provides  a Prolog  library  of  predicates  and  functions  to  operate  on  sets. 

Setjib  is  a library  of  set  routines  for  the  PLEASE  system  (see  pleasejintro(  1)).  PLEASE  is  an  exe- 
cutable specification  language.  It  is  an  extension  of  Path  Pascal  and  supports  the  Vienna  Develop- 
ment Method.  In  the  PLEASE  system,  programs  are  specified  using  pre-  and  post-conditions 
written  in  predicate  logic.  These  pre-  and  post-conditions  are  transformed  into  Prolog  and  exe- 
cuted by  the  UNSW  Prolog  interpreter.  Calls  to  setjib  functions  may  be  included  in  these 
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prototypes. 

Setjib  is  written  in  Prolog  (see  Programming  in  Prolog  by  Clocksin  and  Mellish).  There  are  two 
representations  for  sets,  a standard  representation  and  a concise  representation.  All  set  operations 
are  performed  on  the  standard  representation  of  a set.  The  standard  representation  of  a set  con- 
tains a list  that  enumerates  the  elements  of  the  set.  The  concise  representation  of  a set  includes 
the  low  element  in  the  set,  the  high  element  in  the  set,  the  head  of  a clause  that  will  produce  the 
"next"  element  of  the  set,  and  the  head  of  a clause  that  will  determine  if  two  elements  of  the  set 
are  "equal".  The  concise  representation  provides  a means  for  giving  a short  description  of  a large 
set  (too  large  to  enumerate).  See  libJntro(3)  for  a description  of  PLEASE  data  types  and  general 
information  about  operations  on  those  data  types. 

The  set  library  provides  predicates  for  determining  if  an  element  is  present  in  a set,  if  a set  is 
empty,  if  one  set  is  a subset  of  another,  or  if  two  sets  are  equal  (are  made  up  of  the  same  ele- 
ments). The  set  library  provides  functions  for  finding  the  union  or  intersection  of  two  sets,  finding 
the  difference  of  two  sets  (the  set  difference  A-B  is  the  set  of  all  elements  in  A that  are  not  con- 
tained in  B),  inserting  an  element  in  a set,  or  removing  an  element  from  a set.  All  of  these  opera- 
tions work  on  standard  set  representations.  There  is  a function  that  converts  a concise  representa- 
tion into  a standard  representation. 

Setjransform  takes  a concise  set  representation  as  its  first  argument  and  returns  the  correspond- 
ing standard  set  representation  as  its  second  argument.  It  is  important  to  remember  that  if  a con- 
cise set  representation  is  given,  the  user  MUST  provide  function  definitions  for  the  next  function 
and  the  equal  function. 

Setjnember  determines  if  its  second  argument  (an  element)  is  a member  of  its  first  argument  (a 
standard  set  representation).  If  the  second  argument  b a variable,  setjnember  will  work  as  a gen- 
erator to  successively  generate  the  members  of  the  set  during  backtracking. 

Set_empty  determines  if  its  argument  (a  standard  set  representation)  is  empty. 

Setjunion  takes  two  sets  (standard  set  representations)  as  its  first  two  arguments  and  returns  the 
union  of  those  two  sets.  Set Jntersection  returns  the  intersection  of  the  first  two  sets. 

Set_difference  finds  the  difference  of  its  first  two  arguments.  The  set  difference  A-B  b all  the  ele- 
ments of  set  A that  are  not  in  set  B (A  does  not  have  to  be  a superset  of  B).  Set_difference(A,B,C) 
will  produce  C=A-B. 

Set_subset  determines  if  its  second  argument  b a subset  of  the  first  argument.  Again,  both  argu- 
ments must  be  standard  set  representations. 

Setjjqual  determines  if  its  two  arguments  are  equal.  Both  arguments  must  be  standard  set 
representations. 

SetJnsert_element  inserts  the  first  argument  (an  element)  into  the  second  argument  (a  standard 
set  representation).  The  third  argument  b thb  new  set.  If  the  element  b already  present,  the  new 
set  b the  same  as  the  old  set. 

Setjemovejelement  removes  the  first  argument  (an  element)  from  the  second  argument  (a  stan- 
dard set  representation).  The  third  argument  b thb  new  set.  If  the  element  was  not  present,  the 
new  set  b the  same  as  the  old  set. 

SEE  ALSO 

UbJntro(3)}  pltaaeJntro( l),  encompa8sjntro{  1),  Programming  in  Prolog  by  Clocksin  and  Mellish 
AUTHOR 

Philip  R.  Roberts,  Robert  B.  Terwilliger,  Department  of  Computer  Science,  University  of  Illinois, 
252  Digital  Computer  Laboratory,  1304  West  Springfield  Avenue,  Urbana,  IL  61801. 
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Programmers  can  use  differences  between  versions  of  a program  for  a variety  of  purposes.  Some  peo- 
ple have  acknowledged  this  usefulness,  but  few  have  done  anything  to  help  the  programmer  view 
differences  more  efficiently.  Many  researchers  recognize  the  usefulness  of  tools  which  allow  the  program- 
mer to  refer  to  and  manipulate  programs  in  terms  of  their  structure,  lexical,  syntactic,  and  semantic.  The 
plethora  of  syntax-directed  and  language-oriented  editors  and  environments  surrounding  these  editors 
testifies  to  this  recognition.  No  attention  has  been  given  to  extending  the  ability  to  the  viewing  and  mani- 
pulation of  differences. 

My  thesis  is  that  an  interactive  difference  viewing  system,  which  includes  the  ability  to  organize 
differences  based  on  the  lexical  and  syntactic  structure  of  the  program,  can  help  a programmer  use 
differences  between  versions  of  a program. 

1.  Why  View  Differences 

A programmer,  working  in  either  development  or  maintenance,  may  want  to  view  differences 
between  versions  of  a program.  During  program  development,  several  situations  may  prompt  a program- 
mer to  look  at  the  differences  between  versions  of  a program.  If  several  programmers  are  working  on  a 
project,  a programmer  who  makes  a change  to  shared  code  could  see  the  changes  that  he  or  she  has  made 
by  looking  at  the  differences  between  the  version  with  the  changes  and  the  main  version.  In  this  way,  the 
programmer  can  easily  check  changes  to  see  if  they  look  complete  before  inflicting  them  on  the  rest  of  the 
group.  Checking  whether  the  changes  will  affect  someone  else  also  should  be  easier. 

A programmer  might  also  wa,nt  to  see  the  differences  between  his  or  her  own  version  of  a file  and 
another  programmer’s  version.  Each  could  have  a version  of  the  file  if  they  plan  to  merge  the  versions 
later.  Or  each  might  have  made  changes  to  separate  copies  inadvertently.  In  either  case,  the  versions 
must  be  merged.  The  programmers  can  check  fairly  easily  whether  the  changes  are  compatible  [Heckel, 
1978]. 

A programmer  working  on  maintaining  a program  has  many  reasons  to  look  at  differences  between 
versions  of  a program.  One  of  Uie  most  common  is  probably  the  need  to  find  a new  bug.  While  modifying 
a program,  a programmer  may  accidently  cause  an  error.  Seeing  the  differences  between  the  working 
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version  and  the  nonworking  version  can  help  pinpoint  the  cause  more  quickly  [Heckel,  1978]. 

A project  might  have  more  than  one  programmer  working  on  its  maintenance.  A large  portion  of 
maintenance  is  understanding  what  the  program  and  the  procedures  that  must  change  do  and  what 
ramifications  a change  might  have.  If  a maintainer  is  planning  a modification  and  has  looked  at  the  pro- 
gram before,  but  since  then  someone  else  has  modified  the  program,  differences  could  help  the  maintainer 
understand  the  program  again.  If  the  programmer  remembers  what  the  program  did  before  the  other 
changes,  looking  at  the  differences  between  the  version  with  which  he  or  she  last  worked  and  the  current 
version  could  update  his  or  her  understanding  of  the  program  more  quickly  than  looking  at  the  entire  pro- 
gram again. 

Viewing  differences  can  also  help  a programmer  see  how  something  "Was  done  m the  past.  This  could 
be  useful  in  two  situations.  Suppose  the  way  some  feature  was  implemented  was  changed.  After  several 
modifications  to  other  things,  it  became  clear  that  the  new  method  was  inadequate.  It  would  then  be 
necessary  to  go  back  to  the  old  method  or  to  try  to  incorporate  some  features  of  the  old  method  into  the 
new.  Simply  going  back  to  a version  which  used  the  old  method  is  not  possible  since  other  changes  have 
been  made.  The  programmers  could  look  at  the  differences  between  the  last  version  using  the  old  method 
and  first  version  using  the  new  or  the  last  version  using  the  old  method  and  the  current  version.  These 
differences  would  show  the  differences  between  the  two  methods.  (The  latter  would  include  unrelated 
differences,  but  might  be  necessary  if  the  new  method  has  changed  since  its  inception.)  Seeing  these 
differences  might  also  be  helpful  for  a programmer  who  had  another  program  to  modify.  If  this  other  pro- 
gram uses  either  the  old  or  new  method  of  the  program  that  has  been  changed,  and  the  method  must  be 
changed  in  the  other  program,  viewing  the  differences  for  the  first  program  could  be  instructive. 

Related  to  the  second  use  of  the  differences  mentioned  in  the  previous  paragraph,  seeing  differences  of 
one  program  may  help  in  customizing  another.  Suppose  one  program  already  has  several  versions  for 
different  machines  or  operating  systems.  A second  program  has  been  written  for  one  of  these  systems  but 
needs  versions  for  others.  The  differences  between  versions  for  different  systems  for  the  first  program  will 
show  a programmer  the  types  of  things  in  the  second  program  that  might  need  to  change  and  how  they 
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One  final  reason  for  a maintainer  to  look  at  differences  between  versions  is  to  find  a quick  fix  for  a 
bug  in  a version  in  the  field.  A version  being  used  by  a customer  may  have  a bug  which  has  been  corrected 
in  a version  under  development.  The  customer  may  need  to  have  the  bug  fixed  before  the  new  version  is 
released.  The  developers  cannot  simply  give  the  customer  the  version  under  development  since  it  may  not 
be  complete  or  giving  the  customer  a new  version  might  be  against  the  company’s  policies.  Looking  at  the 
differences  between  the  customer’s  version  and  the  developer’s  version  may  let  a programmer  find  a fix  for 
the  bug  without  duplicating  the  effort  that  already  went  into  fixing  the  bug  in  the  development  version. 

Some  reasons  for  viewing  differences  apply  to  both  development  and  maintenance.  At  either  stage,  a 
programmer  may  have  several  changes  to  make  to  a version  of  a program.  The  programmer  may  elect  to 
make  the  changes  in  stages.  Each  change  or  set  of  changes  can  be  made  and  tested  individually.  After 
making  some  changes,  the  programmer  may  not  remember  which  changes  are  complete,  which  partially 
complete,  and  which  not  started.  The  differences  between  the  version  the  programmer  began  modifying 
and  the  version  he  or  she  has  changed  give  an  easy  way  to  check  the  changes  [Heckel,  1978]. 

After  finishing  a set  of  changes,  the  programmer  can  use  the  differences  between  the  old  version  and 
the  new  one  to  check  that  all  the  changes  are  documented.  The  programmer  can  check  that  comments  in 
the  code  document  the  changes,  as  well  as  seeing  if  existing  comments  have  changed  to  reflect  the  new 
situation.  The  differences  are  also  useful  in  looking  at  all  the  changes  so  that  a record  of  what  hats  changed 
may  be  kept,  as  part  of  a version  control  system  [Thompson,  1980]. 

In  either  development  or  maintenance,  going  back  to  an  older  version  may  be  necessary  because  of  an 
incorrect  change.  However,  more  changes  than  just  the  incorrect  one  may  have  been  made.  Seeing  the 
differences  between  the  current  version  and  the  one  that  does  not  have  the  incorrect  change  will  show  the 
programmer  what  other  changes  will  be  lost  by  going  back  to  the  old  version. 

Finally,  if  a programmer  wants  to  see  a history  of  a program,  he  or  she  may  want  more  detail  than  a 
summary  of  the  changes  made  between  each  version,  but  not  the  text  of  all  the  versions.  The  differences 
between  versions  is  a compromise  in  the  amount  of  detail  and  may  provide  what  the  programmer  wants 


4 


without  providing  much  excess  information  [Tichy,  1982]. 

2.  Features  for  a Difference  Viewing  System 

A system  for  viewing  differences  between  programs  should  have  many  features.  It  should  be  interac- 
tive. The  user  should  be  shown  one  difference  at  a time  and  be  allowed  to  skip  backward  and  forward  in 
the  set  of  differences  shown. 

The  exact  difference  between  the  two  pieces  shown  should  be  highlighted  in  some  way.  It  is  very 
frustrating  to  the  user  to  be  shown  a long  line  from  each  version  that  look  very  similar  and  not  to  be 
shown  what  makes  them  different.  The  user  is  forced  to  scan  the  text  to  determine  the  change  himself  or 
herself,  a task  which  the  computer  could  do  easily,  much  more  quickly,  and  with  fewer  mistakes.  If  the 
difference  is  flagged  for  some  reason  which  is  not  visible,  for  example,  blanks  on  the  end  of  one  line  but  not 
the  other,  the  user  will  waste  quite  a bit  of  time  trying  to  determine  that  no  significant  difference  exists. 

The  user  should  be  able  to  select  parts  of  the  program  for  which  differences  should  be  displayed.  He 
or  she  may  be  interested  in  changes  to  only  certain  sections  of  the  program.  The  viewing  system  should 
not  force  the  user  to  look  at  differences  which  he  or  she  does  not  want. 

The  user  should  be  able  to  select  the  amount  of  context  shown  around  a difference.  Varying  amounts 
of  context  may  be  needed  for  the  user  to  identify  where  the  change  is. 

The  difference  viewing  system  should  present  differences  which  are  divided  into  logical  sections.  The 
changes  to  two  statements,  for  example,  should  be  shown  as  two  differences  regardless  of  the  relative  posi- 
tions of  the  two  statements.  Changes  to  declarations  and  executable  statements  should  be  shown 
separately.  When  several  changes  are  thrown  together  the  user  must  sort  out  which  parts  of  the 
differences  shown  belong  to  which  logical  section. 

The  system  could  determine  context  based  on  the  logical  sections  of  the  program.  This  makes  more 
sense  than  using  line  boundaries.  However,  the  system  must  not  issue  a large  amount  of  context.  The  user 
will  not  want  a page  of  context,  so  context  based  on  logical  units  must  be  tempered  by  the  amount  of  out- 
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The  difference  viewing  system  should  be  able  to  summarize  differences.  The  user  may  only  need  to 
know  which  procedures  have  changed,  for  instance.  If  none  of  the  procedures  in  which  the  user  is 
interested  have  changed,  he  or  she  need  not  look  at  all  the  differences.  In  other  situations,  knowing  which 
procedures  have  changed  may  be  enough  to  remind  the  user  of  the  changes. 

Summaries  at  various  levels  should  be  available.  The  information  that  the  variable  declarations 
changed  could  tell  the  user  that  the  changes  are  or  are  not  relevant  to  what  he  or  she  needs  to  know. 

The  user  should  be  able  to  select  the  level  of  the  summary  from  a set  of  levels.  He  or  she  may  be 
looking  for  changes  in  variable  declarations,  or  may  know  which  procedures  changed  but  want  to  see  what 
statements  have  changed.  The  user  may  want  to  skip  summaries  and  just  see  the  text  that  changed. 

The  choices  of  summaries  should  be  interactive.  The  user  should  be  able  to  get  a summary  of  which 
procedures,  for  example,  have  changed,  then  ask  for  more  detail,  that  is,  a summary  at  a lower  level,  for 
some  of  the  procedures.  Which  summaries  have  changes  shown  in  more  detail  should  be  selectable.  The 
system  should  not  force  the  user  to  see  more  detail  for  all  the  procedures.  The  ability  to  ask  for  more 
detail  for  particular  differences  should  be  possible  until  the  text  of  the  differences  is  displayed. 

The  system  should  allow  the  user  to  simply  ask  for  mpre  detail  without  specifying  a level.  The  sys- 
tem should  select  a reasonable  level  of  detail  to  present  to  the  user.  If  the  change  is  such  that  several  lev- 
els will  present  nearly  the  same  information,  the  system  should  use  the  lowest  of  these  levels.  The  user 
should  not  be  shown  several  levels  which  do  not  appreciably  increase  the  information  provided. 

In  order  for  the  summaries  to  be  useful,  each  construct  to  appear  in  summaries  should  have  a name 
and  a scheme  by  which  an  identifying  name  for  a specific  instance  of  the  structure  can  be  found.  The 
name  of  the  kind  of  construct  is  apparent.  These  would  include  procedure,  variable  declarations,  assign- 
ment statement,  while  statement,  and  expression.  Clearly  each  instance  must  get  an  identifying  name.  If 
three  procedures  change,  having  a system  print  "a  procedure  changed"  three  times  is  not  very  useful. 
Finding  an  identifying  name  for  procedures  is  easy;  but  the  system  also  needs  a scheme  for  naming  assign- 
ment statements,  variable  declarations,  while  statements,  and  other  constructs.  Some  of  the  possible  infor- 
mation the  system  could  use  as  names  includes:  for  a statement,  the  kind  of  statement  augmented  with 
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some  distinguishing  feature,  for  example,  assignment  statement  plus  the  variable  whose  value  changes,  for 
a while,  if,  or  repeat,  the  kind  of  statement  and  the  condition,  for  a case  statement,  case  and  the  case 
expression,  for  a declaration,  the  name  of  the  object  declared.  If  versions  for  a letter,  divided  into  para- 
graphs, sentences,  and  words  are  compared  at  the  paragraph  level,  the  text  of  the  first  sentence  or  the  first 
words  of  the  first  sentence  might  identify  the  paragraph.  In  a list  of  objects  with  no  particular  distinguish- 
ing feature,  the  position,  or  number,  within  the  list  of  the  object  which  changed  could  be  used. 

These  names  should  also  be  used  for  labeling  differences  so  that  the  user  can  tell  where  the  change  is 
located.  The  change  could  be  labeled  by  the  procedure  in  which  it  is  contained  or  by  labels  for  all  the 
summary  levels  which  contain  it,  or  some  subset  of  this.  The  list  of  all  the  labels  would  be  more  informa- 
tive, but  could  get  too  large  to  be  displayed  practically. 

Another  desirable  feature  for  a system  that  compares  programs  is  the  ability  to  ignore  formatting 
information.  For  virtually  all  reasons  that  a programmer  wants  to  see  differences  between  program  ver- 
sions, the  formatting  is  irrelevant.  With  pretty  printing  programs  and  editors  that  format  programs,  hav- 
ing different  formats  for  versions  becomes  more  likely.  The  difference  system  should  not  produce 
differences  which  will  never  be  important. 

The  system  should  be  able  to  produce  the  differences  quickly.  A faster  system  will  encourage  more 

use. 

To  improve  speed,  the  system  should  take  advantage  of  existing,  available  information.  An  example 
of  such  information  is  the  differences  stored  in  a version  control  system.  If  this  information  will  speed  up 
the  difference  system,  it  should  be  used. 

The  difference  system  should  be  able  to  find  differences  between  any  two  versions  of  a program. 
Differences  involving  the  most  recent  versions  will  probably  be  needed  most  frequently,  so  having  these 
combinations  favored  could  improve  efficiency,  but  all  combinations  should  be  possible. 

For  all  the  options  which  the  system  has,  the  user  should  not  have  to  specify  which  option  to  use. 
The  system  should  have  reasonable  defaults  for  all  options.  This  will  save  time  for  the  user. 
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In  addition,  the  difference  system  should  incorporate  principles  of  good  interface  design.  Also,  it 
should  be  able  to  use  screen  capabilities  of  terminals  when  possible. 

In  sum,  a difference  viewing  system  should  be  interactive,  highlight  the  exact  difference,  allow  the 
user  to  restrict  which  parts  of  the  program  have  differences  shown,  allow  selection  of  the  amount  of  con- 
text to  show,  divide  differences  into  logical  sections,  use  these  logical  sections  to  determine  context  when 
practical,  summarize  differences  at  various  levels,  allowing  the  user  to  select  the  level  if  he  or  she  chooses 
and  to  interactively  elect  to  see  more  detail  for  some  differences,  identify  the  location  of  differences  with 
labels  from  the  summaries,  be  able  to  ignore  formatting  changes  in  finding  differences,  be  fairly  fast,  use 
available  information  from  other  sources,  and  work  with  any  version  from  a version  control  system. 

A difference  viewing  system  should  also  be  integrated  with  other  tools.  The  interactions  between  the 
difference  system  and  the  other  tools  will  help  both. 

The  difference  system  should  be  integrated  with  an  editor.  This  should  allow  the  user  to  easily  see 
differences  between  the  version  being  edited  and  other  versions.  The  user  will  then  be  able  to  see  what 
changes  he  or  she  has  made. 

The  difference  system  should  provide  the  editor  with  an  undo  command  based  on  the  differences.  A 
difference-based  undo  allows  the  user  to  view  differences  and  select  which  to  undo.  (The  user  could  be 
allowed  to  select  differences  to  undo  after  having  viewed  all  the  differences,  or  be  allowed  to  select 
differences  to  undo  as  they  are  displayed.)  Undoing  a difference  consists  of  deleting  the  text  that  is  in  the 
new  version  and  replacing  it  with  the  text  in  the  old  version.  The  changes  that  can  be  undone  are  limited 
by  what  versions  for  the  file  exist. 

The  undo  should  take  advantage  of  the  difference  system  dividing  and  summarizing  differences. 
Dividing  differences  lets  the  user  choose  a smaller  unit  to  undo.  If  changes  were  not  divided,  the  user 
would  not  be  able  to  undo  one  change  without  undoing  all  the  others.  Dividing  differences  makes  a 
difference-based  undo  more  responsive. 

Summarizing  differences  also  makes  a difference-based  undo  more  convenient.  If  all  the  changes  in  a 
procedure  need  to  be  undone,  the  user  can  get  a summary  of  changes  at  the  procedure  level  and  ask  for 
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that  difference,  which  could  contain  many  textual  differences,  to  be  undone.  The  difference-based  undo 
operating  on  the  summary  level  allows  the  user  to  restore  one  procedure,  say,  to  a previous  state  without 
having  to  request  the  undoing  of  each  difference  individually. 

The  user  should  be  able  to  take  the  version  he  or  she  is  editing,  choose  some  differences  to  undo,  and 
easily  create  another  version  based  on  the  current  version  with  the  selected  differences  undone.  This  would 
help  a programmer  in  debugging.  If  a bug  has  appeared,  the  programmer  would  be  able  to  selectively 
eliminate  changes  in  a temporary  version  without  disturbing  the  current  version.  He  or  she  could  then  test 
the  temporary  version.  If  the  bug  was  still  present,  the  programmer  could  go  back  to  the  undisturbed  ver- 
sion and  try  undoing  some  other  changes  until  the  one(s)  that  are  the  source  of  the  bug  are  found. 

The  editor  with  which  the  difference  system  is  integrated  should  be  a structural,  e.g.,  syntax-directed 
or  language-oriented,  editor.  This  kind  of  editor  will  have  the  program  represented  in  some  tree  form, 
such  as  an  abstract  syntax  tree  or  parse  tree.  This  would  make  dividing  the  differences  into  logical  units 
or  summarizing  the  differences  easier  for  the  difference  system.,  For  these  tasks,  if  the  program  were  not 
already  represented  in  a tree  form,  the  difference  system  would  have  to  get  it  into  such  a form  itself.  Hav- 
ing the  tree  structure  kept  makes  the  difference  system  faster  and  more  flexible. 

Having  a structural  editor  also  allows  the  difference  system  to  get  a little  extra  information.  The 
editor  can  fairly  readily  record  which  parts  of  the  program  have  changed.  This  can  help  the  difference  sys- 
tem identify  changes  more  quickly. 

Another  tool  with  which  a difference  system  should  be  integrated  is  a version  control  system.  As 
mentioned  before,  the  difference  system  should  be  able  to  compare  any  two  versions.  Also,  the  difference 
system  should  use  information  available  in  the  version  control  system.  The  version  control  system  will 
store  multiple  versions  by  storing  differences  between  versions.  If  the  user  asks  to  see  differences  for  ver- 
sions for  which  the  difference  is  stored  in  the  version  control  system,  the  difference  system  should  use  this 
to  locate  the  differences.  Further,  if  a sequence  of  differences  between  the  versions  exists,  the  difference  sys- 
tem can  combine  these  to  locate  differences  in  the  two  versions.  Using  the  information  in  the  version  con- 
trol system  will  make  locating  differences  faster. 
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The  difference  system  can  also  help  the  version  control  system  in  merging  two  versions.  The 
difficulty  with  merging  comes  when  a conflict  arises — two  versions  insert  different  text  in  the  same  place, 
or  one  version  deletes  text  around  the  location  at  which  another  version  inserts,  for  example.  The 
difference  system  can  help  in  several  ways. 

Use  of  the  divisions  of  the  differences  and  summary  levels  can  help  when  two  versions  both  have  code 
inserted  at  the  same  location.  The  differences  can  be  marked  to  indicate  what  kind  of  section  contained 
them.  If  the  two  sections  to  be  inserted  came  from  different  kinds  of  sections,  this  could  order  the  sections. 
For  example,  suppose  one  version  had  declarations  inserted  at  the  end  of  the  declaration  section  and 
another  had  statements  inserted  at  the  beginning  of  the  executable  statements.  To  a merging  program 
which  considers  the  program  as  text,  this  would  look  like  two  insertions  at  the  same  spot.  But  if  the 
differences  were  marked  with  which  kind  of  section  they  were,  a merging  program  could  find  the  kind  of 
section  on  the  left  and  right  of  the  point  of  insertion  and  place  the  sections  by  the  same  kind  of  section  as 
that  from  which  they  came. 

Dividing  the  differences  into  logical  sections  would  help  if  each  version  had  inserted  both  declarations 
and  statements  in  the  same  spot.  The  new  declarations  and  new  statements,  though  contiguous,  would  be 
divided  into  separate  differences.  Thus  in  the  merged  version  both  new  declaration  sections  could  be 
placed  together,  before  the  new  statements. 

If  the  difference  system  is  integrated  with  a structural  editor,  differences  can  be  done  on  the  tokens. 
Having  this  eliminates  some  conflicts.  Changes  which  were  made  to  the  same  line  in  two  versions  and 
which  are  separated  by  a token  will  not  conflict.  This  situation  could  arise  commonly  when  elements  are 
added  to  a list,  such  as  a list  of  variables  being  declared  or  the  definition  of  an  enumerated  type. 

When  conflicts  arise,  the  user  must  look  at  the  problem  area  and  edit  the  merged  version  so  that  it  is 
correct.  This  should  be  interactive  in  a manner  similar  to  difference  viewing.  The  interactive  system  could 
take  the  user  from  one  conflict  to  another.  Allowing  the  user  to  easily  see  other  parts  of  the  program  so 
that  he  or  she  can  see  the  results  of  merging  that  did  not  cause  a conflict  but  may  bear  on  how  to  resolve 
one  is  important.  The  conflicts  should  be  labeled  by  their  locations.  Getting  summaries  of  where  conflicts 
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are  located  might  also  be  useful.  With  several  versions  being  merged,  the  person  attempting  to  resolve 
conflicts  may  not  know  enough  to  resolve  them  all.  The  user  could  select  the  conflicts  in  the  procedures 
which  he  or  she  changed  and  let  other  people  resolve  others. 

Having  conflicts  resolved  with  the  aid  of  a special  tool  allows  commands  specific  to  merging  to  be 
included.  The  specific  commands  would  depend  upon  how  conflicts  are  represented.  Some  possibilities 
include  leaving  the  code  as  it  is,  selecting  the  text  of  one  version  or  the  other,  or  asking  for  the  text  from 
one  version  followed  by  that  from  the  other. 

Another  tool  with  which  a difference  system  could  be  integrated  is  a program  slicer.  This  will  make 
the  difference  system  more  useful,  but  not  the  slicer.  A program  slicer  takes  a point  in  the  program  and  a 
set  of  variables  and  finds  all  the  statements  which  affect  the  values  of  those  variables  at  that  point.  In 
essence  the  result  is  a program  which  would  give  as  results  the  values  of  the  set  of  variables  at  that  point. 

With  a program  slicer  integrated  with  the  difference  system,  the  user  should  be  able  to  ask  for  only 
differences  that  affect  the  value  of  selected  variables  at  a certain  point.  This  might  reduce  the  amount  of 
text  that  the  user  would  need  to  see. 

The  difference  system  can  also  be  integrated  with  any  system  which  does  incremental  analysis  which 
can  be  batched.  Some  possible  tools  that  are  amenable  to  incremental  analysis  and  whose  results  are  not 
needed  immediately  after  each  change  include  an  incremental  recompilation  system,  a tool  which  performs 
consistency  checks  between  the  source  code  and  its  documentation  or  specification,  a test  case  generator,  a 
test  coverage  analyzer  (perhaps  with  data  flow  analysis),  and  software  management  systems.  Use  of  one 
tool  should  not  interfere  with  the  use  of  any  other  tool.  A new  tool  could  be  added  to  this  system  easily. 

3.  Previous  Work 

3.1.  Uses  of  Differences 

Differences  between  strings  have  many  uses.  They  are  used  extensively  in  biology  and  speech  recog- 
nition. The  first  use  in  computer  science,  as  indicated  by  Sankoff  and  Kruskal  [Sankoff  and  Kruskal,  1983] 
and  Hall  and  Dowling  [Hall  and  Dowling,  1980],  was  in  spelling  correction.  The  problem  is  to  find  a 
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correct  spelling  of  a misspelled  word.  The  solution  is  to  find,  out  of  the  set  of  possible  correct  words, 
either  one  that  is  the  closest  or  one  close  enough  to  the  incorrect  word. 

Several  methods  are  based  on  abbreviating  the  words.  Blair  [Blair,  1960]  devises  an  abbreviation  by 
eliminating  letters  based  on  their  positions  in  the  word  and  the  letters’  frequency  of  occurrence.  Words  are 
matched  based  on  their  abbreviations.  If  no  match  is  found,  the  system  gives  up.  If  more  than  one  match 
is  found,  larger  abbreviations  are  used  until  only  one  match  exists. 

Davidson  [Davidson,  1962]  also  uses  abbreviations  to  retrieve  names  in  an  airline  reservation  system. 
His  system  takes  the  first  letter  of  the  surname,  the  first  three  characters  remaining  after  eliminating  all 
vowels,  /is,  xvs,  and  ys  and  removing  one  occurrence  of  any  letters  doubled  after  this.  The  last  letter 
included  is  the  first  initial.  Names  are  retrieved  solely  from  the  abbreviation.  Additional  information, 
such  as  the  person’s  phone  number  is  used,  if  available,  to  eliminate  multiple  retrievals.  If  this  is  not  pos- 
sible, the  operator  receives  all  the  matching  records  and  selects  the  correct  one. 

Davidson’s  system  does  not  rely  on  always  finding  a match.  If  no  record  exactly  matches  the  abbre- 
viation, the  records  which  best  match  the  abbreviation  are  retrieved.  How  good  the  match  is  is  determined 
by  listing  the  character  positions  that  match  in  both  abbreviations  and  finding  the  length  of  the  longest 
increasing  subsequence.  This  is  also  the  length  of  the  longest  common  subsequence  of  the  two  abbrevia- 
tions. 

In  general,  Blair’s  and  Davidson’s  methods  are  applicable  only  to  spelling.  They  offer  no  help  in 
comparing  words  or  other  strings  for  any  other  purpose. 

Faulk  [Faulk,  1964]  defines  three  measures  of  similarity  between  strings.  Each  is  a number  between 
zero  and  one,  with  a larger  number  indicating  more  similarity.  The  three  numbers  indicate  the  extent  to 
which  the  strings  share  common  elements,  the  common  elements  are  in  the  same  order,  and  the  common 
elements  are  in  the  same  positions.  These  measures  help  choose  the  best  match  out  of  a list,  and  can  sug- 
gest how  similar  two  strings  are,  but  are  not  helpful  in  showing  the  differences. 

Damerau’s  [Damerau,  1964]  method  attempts  to  correct  words  with  one  typing  error:  a substitution 
of  one  character  for  another,  insertion  or  deletion  of  one  character,  or  transposition  of  two  (adjacent) 
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characters.  His  method  is  specific  to  checking  if  a word  could  be  derived  from  the  given  word  by  one  of 
these  errors.  It  also  includes  a few  steps  to  decrease  the  number  of  words  in  the  vocabulary  which  must  be 
tested. 

Alberga  [Alberga,  1967]  took  several  spelling  correction  methods  and  a set  of  misspellings  from  spel- 
ling exams  to  see  which  method  did  the  best  job.  The  results  of  this  study  are  not  interesting  for  finding 
differences,  but  the  paper  does  give  an  interesting  summary  of  various  spelling  correction  methods. 

Morgan  [Morgan,  1970]  is  interested  in  correcting  spelling  and  typing  errors  to  decrease  the  number 
of  runs  a user  must  make  to  get  job  control  and  programs  correct.  His  method  uses  semantic  information 
to  narrow  the  search  for  possibilities.  Then  Damerau’s  method  is  applied  to  find  a correct  word  from  the 
list  of  possibilities.  The  semantic  information  that  Morgan  uses  includes  what  items  are  in  the  follow  set 
and  which  identifiers  in  the  symbol  table  are  of  the  correct  type. 

Another  area  in  which  differences  are  used  is  in  correction  of  syntax  errors.  Several  methods  use  a 
cost  function  to  help  determine  which  correction  to  make.  The  cost,  in  essence,  is  based  on  the  edit  opera- 
tions needed  to  transform  the  input  to  one  of  the  possible  corrections.  Anderson,  et  al.  [Anderson,  et  al., 
1983],  Graham  and  Rhodes  [Graham  and  Rhodes,  1975],  and  Mickunas  and  Modry  [Mickunas  and  Modry, 
1978]  all  use  the  costs  of  inserting  and  deleting  symbols  to  find  the  cost  of  a correction.  These  methods  do 
not  use  the  techniques  for  getting  the  minimum  edit  distance,  but  the  ideas  are  similar. 

Tai  [Tai,  1978]  actually  uses  one  of  the  methods  of  minimizing  edit  costs  with  insertions,  deletions, 
replacements,  and  transpositions  allowed.  After  finding  possible  corrections,  the  method  to  find  the  edit 
cost  is  applied  to  find  the  correction  which  is  closest  to  the  input  text. 

Perhaps  the  most  widely  recognized  use  of  differences  in  computer  science  is  in  storing  multiple  ver- 
sions of  a file.  If  someone  wants  to  save  several  versions  of  a file,  the  versions  will  usually  have  more  in 
common  than  different.  Instead  of  saving  the  entire  text  of  all  versions,  which  would  usually  consist  of  a 
large  amount  of  common  material,  one  version  can  be  saved  in  its  entirety  along  with  enough  information 
to  produce  the  other  versions  from  this  one.  If  versions  are  kept  in  this  way,  a set  of  tools  should  store  the 
information  necessary  to  retrieve  versions  and  the  user  should  be  able  to  specify  the  version  desired  and 
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have  it  retrieved  automatically.  As  long  as  tools  exist  to  keep  track  of  versions,  they  usually  perform 
other  functions,  such  as  keeping  logs  of  what  changes  have  been  made  and  providing  exclusive  use  of  a ver- 
sion  to  a user. 

Despite  the  savings  in  space  that  can  be  achieved  by  using  differences  some  systems  which  save  ver- 
sions  save  the  complete  text  of  each  version  that  is  kept.  The  Distributed  Programming  Assistant  [Ram- 
say, 1983]  keeps  all  versions  of  programs  and  also  all  the  supporting  files  that  are  ever  produced.  The  Pro- 
ject Automated  Librarian  [Prager,  1983]  stores  entire  copies  of  versions,  but  saves  only  a set  number. 

Other  systems  store  multiple  versions  and  save  the  common  parts  only  once  but  do  not  use 
differences.  These  systems  keep  all  the  versions  in  one  file  and  have  control  information  so  that  the 
appropriate  lines  are  used  for  the  desired  version.  One  system  [Stanaway,  et  al.,  1979]  uses  conditional 
assembly  to  get  the  correct  statements  for  the  desired  version.  Another  [Hague  and  Ford,  1976]  keeps  the 
file  with  control  information  and  has  a tool  to  extract  the  version  needed. 

Cargill  [Cargill,  1980]  has  developed  a system  that  uses  a hierarchical  directory  structure  to  store 
versions.  The  system  was  developed  to  store  the  programs  for  an  operating  system  intended  to  run  on 
different  machines.  Each  machine  has  some  functions  which  must  be  customized.  The  system  is  set  up 
with  a directory  for  each  function.  In  it  is  the  source  for  the  common  function.  Any  machine  that  needs 
something  else  has  a subdirectory  with  the  files  it  needs.  Some  space  is  saved  since  common  files  are  stored 
once,  but  anything  in  common  between  the  versions  for  specific  machines  will  be  duplicated. 

Many  systems  use  differences  saved  as  edit  scripts  to  save  multiple  versions.  A good  example  of  this 
is  SCCS,  the  Source  Code  Control  System  [Rochkind,  1975].  It  saves  the  original  version.  Each  additional 
version  is  saved  by  storing  the  difference  between  it  and  the  version  before  it. 

Several  systems  have  been  patterned  after  SCCS.  Two  of  these  are  the  systems  developed  by  Peder- 
sen and  Buckle  [Pedersen  and  Buckle,  1978]  and  Bauer  and  Birchall  [Bauer  and  Birchall,  1978].  Pedersen 
and  Buckle’s  system  allows  a tree  structure  of  versions.  Bauer  and  Birchall  performs  many  management 
functions  as  well  as  merging  differences  in  object  files  when  possible. 
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Another  system  which  uses  differences  to  save  multiple  versions  is  RCS,  the  Revision  Control  System 
[Tichy,  1982].  RCS  allows  a tree  structure  of  versions.  Instead  of  storing  the  oldest  version  and  differences 
to  generate  the  more  recent  version  (forward  deltas),  RCS  stores  the  most  recent  version  and  differences  to 
generate  the  older  versions  (backward  deltas).  This  allows  the  newer  versions,  which  presumably  will  be 
accessed  more  frequently,  to  be  generated  more  quickly. 

Another  version  control  system  [Kaiser  and  Habermann,  1983]  concentrates  on  specification  and 
management  issues,  rather  than  space  considerations.  What  method  it  uses  for  storing  versions  is  not 
stated. 

A fourth  use  of  differences  in  computer  science  is  in  updating  text  which  is  already  at  the  receiving 
site.  Differences  can  be  used  to  update  programs,  manuals,  and  display  screens.  When  a site  has  a version 
of  a program  or  data  set  and  needs  a new  version,  the  differences  will  usually  be  shorter  and  can  be 
transmitted  more  quickly. 

Screen  oriented  programs  also  use  differences  to  attempt  to  reduce  the  amount  of  characters 
transmitted  to  update  the  screen  display.  Gosling  [Gosling,  1981]  describes  an  algorithm  and  a heuristic 
for  updating  the  display  of  a screen  editor,  if  terminal  has  certain  abilities. 

Some  attention  has  been  given  to  providing  differences  that  can  be  viewed.  Suppliers  of  operating 
systems  often  provide  a general  utility  for  finding  differences  between  text  files.  UNIX  [UNDC  User’s 
Manual,  1984]  and  VMS  [Digital  Equipment  Corp.,  1985]  are  some  examples  of  operating  systems  which 
provide  such  a tool. 

A tool  under  development  that  helps  display  differences  between  versions  of  programs  is  an  editor 
that  edits  multiple  versions  of  a program  [Kruskal,  1984].  The  user  of  the  editor  specifies  which  versions 
to  edit.  Any  changes  made  apply  to  all  the  versions  being  edited,  or  a subset  of  those  if  the  user  so 
specifies.  Parts  of  the  text  that  differ  among  the  versions  being  edited  are  highlighted.  The  editor  has  a 
restore  command  that  lets  the  user  put  text  from  an  older  version  into  the  versions  being  edited. 
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3.2.  Finding  Differences 

The  use  of  general  difference  finding  algorithms  seems  to  have  developed  in  biology  before  developing 
in  computer  science.  The  first  mention  in  the  computer  science  literature  seems  to  be  in  1974  in  two 
separate  papers  [Sellers,  1974]  [Wagner  and  Fischer,  1974].  Sellers  presents  an  algorithm  that  takes 
0(m2n)  time  and  space,  where  m and  n are  the  lengths  of  the  strings  being  compared.  This  algorithm  finds 
the  smallest  number  of  changes  (deletion  of  a character  from  either  string  or  replacement  of  one  character 
with  another)  needed  to  convert  both  strings  to  the  same  string.  Wagner  and  Fischer  present  an  algorithm 
to  find  the  number  of  insertions,  deletions,  and  replacements  of  single  elements  needed  to  convert  one 
string  into  the  other.  Their  algorithm  uses  O(mn)  time  and  space. 

Lowrance  and  Wagner  [Lowrance  and  Wagner,  1975]  give  an  algorithm  for  an  extension  to  Wagner 
and  Fischer’s  problem.  They  allow  swapping  two  adjacent  elements  or  two  elements  that  would-be  adja- 
cent after  all  the  deletions  are  performed  but  before  any  insertions  are  done.  This  algorithm  also  uses 
O (mn)  time  and  space. 

All  the  algorithms  mentioned  so  far  are  based  on  a dynamic  programming  approach  to  the  problem. 
The  solution  is  found  for  substrings  of  the  two  strings.  One  element  is  added  to  one  substring  and  the 
solution  for  the  new  substrings  is  found  based  on  the  solution  for  the  smaller  substrings.  The  substrings 
used  are  prefixes  (or  suffixes)  of  the  two  strings.  The  solution  is  found  for  each  pairing  of  substrings.  So 
each  entry  in  an  m X n (or  (m  + 1)  X (n  + 1)  if  zero  length  prefixes  are  included)  matrix  is  found.  Masek 
and  Paterson  [Masek  and  Paterson,  1980]  attempt  to  find  the  solution  more  quickly  by  precomputing  all 
possible  differences  between  costs  in  the  matrix  for  submatrices,  then  combining  the  appropriate  precom- 
puted values  for  the  particular  strings.  This  produces  an  algorithm  that  executes  in  time  of  0(mn/logn), 
but  which  can  only  be  used  in  problems  with  a finite  alphabet. 

Heckel  [Heckel,  1978]  proposes  a method  which  is  not  based  on  dynamic  programming.  Heckel 
describes  his  method  in  terms  of  files  and  lines  in  the  files.  The  algorithm  enters  each  line  into  a symbol 
table  and  records  information,  such  as  the  number  of  occurrences  of  the  line  in  each  file,  about  it.  If  a line 
in  the  symbol  table  has  exactly  one  occurrence  in  each  file,  the  occurrences  are  considered  the  same.  Lines 
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which  are  identical  and  are  adjacent  to  lines  considered  the  same  are  considered  the  same.  Any  other  lines 
are  considered  to  be  inserted  or  deleted.  This  method  finds  lines  that  have  moved  as  well.  Its  weakness  is 
in  relying  on  having  many  lines  with  exactly  one  occurrence  in  each  file  to  get  a good  match. 

Tichy  [Tichy,  1984]  has  developed  a method  for  finding  block  moves.  This  method  includes  any  ele- 
ment in  both  strings  in  a block  move.  This  minimizes  the  number  of  elements  inserted.  Then  the  number 
of  moves  to  generate  the  rest  of  the  string  is  minimized.  By  using  a suffix  tree  for  the  string,  the  algorithm 
can  run  in  time  and  space  of  0(m  -+*  n).  The  advantage  of  Tichy ’s  method  is  that  it  attempts  to  reduce  the 
amount  of  space  the  editing  commands  take.  Presumably  an  insert  command,  which  must  include  the  text 
to  insert,  takes  more  space  than  a move  command.  A disadvantage  is  that  the  original  string  will  not  be 
accessed  sequentially,  and  so,  unless  it  can  be  accessed  randomly,  rebuilding  the  new  string  will  normally 
require  multiple  passes  through  the  original. 

A problem  closely  related  to  the  one  of  finding  an  edit  script  to  convert  one  string  into  another  is 
that  of  finding  the  longest  common  subsequence  of  two  strings.  The  solution  to  the  longest  common  subse- 
quence problem  can  be  used  to  produce  an  edit  script  by  inserting  elements  in  the  new  string  but  not  the 
common  subsequence  and  deleting  elements  in  the  original  string  but  not  in  the  common  subsequence. 
Likewise,  any  method  that  finds  edit  scripts  with  insertions  and  deletions  can  be  used  to  find  the  longest 
common  subsequence.  Methods  that  include  replacement  and  transposition  can  also  be  used  by  setting  the 
cost  of  a replacement  or  transposition  above  the  cost  of  an  insertion  and  deletion  together  so  that  inserting 
and  deleting  will  always  be  preferred. 

Hirschberg  [Hirschberg,  1975]  takes  Fischer  and  Wagner’s  algorithm  and  notes  that  the  values  of  the 
ith  row  depend  only  on  the  (i  - l)th  row.  Thus  the  length  of  the  longest  common  subsequence  can  be 
found  using  0(m  -f-  n)  space.  Finding  the  sequence  itself  is  more  difficult  but  can  also  be  done  using  a 
linear  amount  of  space. 

Hunt  and  Szymanski  [Hunt  and  Szymanski,  1977]  developed  an  algorithm  that  works  well  when  the 
strings  match  in  few  places.  The  method  keeps  a list  for  each  position  in  one  string  of  matching  locations 
in  the  other  string.  It  takes  0((r  + n)logn)  time  and  0(r  + n)  space,  where  r is  the  number  of  pairs  of 
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matching  positions. 

Hirschberg  [Hirschberg,  1977]  developed  two  other  algorithms.  One  works  well  when  the  length  of 
the  longest  common  subsequence  is  short  and  the  other  works  well  when  it  is  long.  If  p is  the  length  of  the 
longest  common  subsequence,  the  first  runs  in  0(pn  + nlogn)  time  and  the  second  in  0(p(m  + 1 — p)logn) 
time. 

Nakatsu,  et  al.  [Nakatsu,  et  al.,  1982]  have  another  algorithm  that  works  well  for  strings  with  a long 
common  subsequence.  Their  algorithm  compares  (m  — p)n  elements  of  the  strings  and  computes  (p  + l)(ra 
— P + 1)  elements  of  a two  dimensional  array,  where  again  p is  the  length  of  the  longest  common  subse- 
quence. 

Finally,  Hsu  and  Du  [Hsu  and  Du,  1984]  presented  some  improvements  of  two  known  algorithms. 
Where  Hirschberg^  algorithm  uses  a linear  search,  theirs  uses  a binary  search.  They  also  recommend  a 
faster  merging  algorithm  for  part  of  Hunt  and  Szymanski’s  algorithm. 

Several  people  have  worked  on  bounds  on  the  complexity  of  the  longest  common  subsequence  and 
string  editing  problem.  Assuming  the  only  type  of  comparisons  allowed  tell  whether  two  elements  in  the 
strings  are  equal  or  not  equal,  Aho,  Hirschberg,  and  Ullman  [Aho,  Hirschberg,  and  Ullman,  1976] 
developed  three  lower  bounds  on  the  number  of  comparisons  needed  to  solve  the  longest  common  subse- 
quence problem.  If  s is  the  size  of  the  alphabet  and  both  strings  are  of  length  n then  the  lower  bounds  are 
s/2(n  + s/2)  if  8 < n , 3/4 ns  if  n < s < 4/3n,  and  ri?  if  4/3 n < «.  If  no  comparisons  between  elements  in 
the  same  string  are  allowed,  the  lower  bound  is  v?  if  s > 3. 

Wagner  [Wagner,  1975]  looked  at  the  extended  string  editing  problem,  that  is  producing  an  editing 
sequence  of  insertions,  deletions,  replacements  and  transpositions  that  will  convert  one  string  into  the 
other.  He  let  some  of  the  operation  costs  be  infinite  and  showed  that  some  of  these  problems  are  NP- 
complete. 

Wong  and  Chandra  [Wong  and  Chandra,  1976]  used  the  same  comparison  model  that  Aho,  Hirsch- 
berg, and  Ullman  used.  Also,  they  assumed  an  arbitrarily  large  alphabet.  With  these  assumptions,  the 
problem  of  developing  an  edit  sequence  with  insertions,  deletions,  and  replacements  has  a lower  bound  on 
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the  number  of  comparisons  of  O(mn). 

Hirschberg  [Hirschberg,  1978]  looked  at  the  longest  common  subsequence  problem  again.  If  com- 
parisons between  string  elements  can  return  a result  of  less  than,  equal,  or  greater  than,  a lower  bound  on 
the  number  of  comparisons  needed  is  nlogn  where  n is  the  length  of  both  strings. 

Attention  has  also  been  given  to  the  problem  of  comparing  trees.  Selkow  [Selkow,  1977]  developed 
an  algorithm  patterned  after  SankofPs  [Sankoff,  1972]  and  Wagner  and  Fischer's.  It  allows  changing  the 
label  of  a node  and  insertion  and  deletion  of  leaf  nodes.  This  is  not  to  say  that  only  nodes  that  are  leaves 
in  the  original  tree  may  be  inserted  or  deleted,  but  rather,  at  the  time  that  a node  is  inserted  or  deleted,  it 
must  be  a leaf.  So  to  delete  an  interior  node,  all  its  descendents  must  be  deleted.  The  algorithm  takes 
O(mn)  time  and  space,  where  m and  n are  the  number  of  nodes  in  the  original  and  new  trees. 

Tai  [Tai,  1979]  developed  a less  restrictive  algorithm.  It  allows  interior  nodes  to  be  inserted  or 
deleted.  When  an  interior  node  is  deleted,  its  children  are  attached  to  its  parent  in  the  deleted  node’s  posi- 
tion. If  an  interior  node  is  inserted,  it  may  take  some  of  its  parent’s  children  as  its  own,  in  such  a way 
that  deletion  is  the  inverse  of  insertion.  This  algorithm  operates  in  0(mnh2P)  time,  where  h and  i are  the 
heights  of  the  original  and  new  trees. 

Wilhelm  [Wilhelm,  1981]  was  interested  in  finding  a mapping  between  tree  nodes  that  would  map  all 
nodes  in  the  original  tree  with  a node  in  the  new  tree  with  the  same  label  to  some  node  in  the  new  tree  and 
preserve  the  most  parent-child  links  in  the  tree.  The  algorithm  is  designed  for  trees  in  which  all  nodes  in 
the  original  tree  have  unique  labels  and  all  nodes  with  the  same  label  have  the  same  number  of  children. 
Wilhelm  gives  an  analysis  of  the  time  the  algorithm  would  take  for  two  types  of  trees,  a complete  tree  and 
a degenerate  tree,  both  having  all  interior  nodes  with  r children.  If  h is  the  height  of  the  original  tree  and 
n is  the  number  of  occurrences  of  the  nodes  in  the  original  tree  in  the  new  tree,  the  time  for  the  complete 

tree  is  0(n(nr)*)  and  the  time  for  the  degenerate  tree  is  0(nA+  x(r  — l)). 

Tichy  [Tichy,  1985]  has  developed  an  unpublished  algorithm  for  finding  differences  between  trees. 
The  algorithm  assumes  that  each  node  of  the  same  type  has  the  same  number  of  children.  The  trees  are 
linearized  by  taking  the  preorder  traversal.  Then  an  algorithm  to  find  the  differences  between  strings  is 
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applied. 

3.3.  Problems  with  Existing  Work 

Although  some  researchers  have  recognized  the  usefulness  of  differences  between  versions  of  programs 
to  programmers,  little  emphasis  has  been  given  to  differences.  Work  with  differences  has  dealt  mainly  with 
their  use  in  storing  versions  so  that  less  space  is  required  than  would  be  if  versions  were  stored  in  their 
entirety.  For  programmers  wishing  to  view  differences  between  program  versions,  little  support  beyond 
the  rudimentary  tools  can  be  found. 

Version  control  systems,  though  they  have  the  versions  that  would  be  compared  and  sometimes  use 
differences  to  store  these  versions,  for  the  most  part  do  not  provide  facilities  for  programmers  to  see 
differences  between  versions.  Some  exceptions,  such  as  RCS  and  SCCS,  exist.  These  both  provide  a com- 
mand which  will  show  the  differences  between  two  versions.  The  commands  check  out  the  desired  versions 
and  use  a UNIX  diff  command  to  compare  them.  It  seems  that  other  version  control  systems  do  not 
attempt  to  help  programmers  see  differences. 

When  differences  are  provided  to  the  programmer,  they  show  what  sections  have  had  changes  made 
to  them,  without  regard  to  whether  several  unrelated  changes  have  been  made  to  the  section.  Most 
difference  tools  also  cannot  distinguish  between  an  actual  change  in  the  program  and  a change  in  the  for- 
mating. Current  algorithms  really  can  do  no  better  than  this.  With  a program  represented  as  text,  the 
algorithms  have  no  basis  for  deciding  anything  beyond  which  sections  have  changed.  With  many  tools  and 
environments  treating  program  as  trees — abstract  syntax  trees  or  parse  trees — doing  a better  job  should  be 
possible.  The  four  existing  tree  comparison  algorithms  do  not  seem  up  to  the  task. 

Wilhelm’s  algorithm  is  clearly  inappropriate.  The  requirement  that  all  nodes  in  the  original  tree 
have  unique  labels  would  not  be  met. 

Selkow’s  algorithm  is  not  general  enough.  For  both  abstract  syntax  trees  and  parse  trees,  interior 
nodes  can  be  deleted  and  inserted  without  all  the  descendents  being  deleted.  An  example  of  this  is  chang- 
ing a repeat  statement  into  a while  statement.  The  statement  block,  which  could  be  large,  would  not 


20 


change. 

Tichy’s  algorithm  is  designed  more  to  store  versions  of  trees  compactly  than  to  find  differences  to 
display.  Changes  made  to  several  adjacent  subtrees  would  all  be  one  difference  to  this  algorithm.  This  is 
the  same  problem  the  string  comparison  algorithms  have.  Also,  the  restriction  to  trees  in  which  all  nodes 
with  the  same  label  have  the  same  number  of  children  would  generally  be  a problem.  The  grammar  for  a 
tool  using  a parse  tree  might  have  multiple  rules  with  differing  numbers  of  elements  for  one  nonterminal. 
Abstract  syntax  trees  and  parse  trees  using  regular  right  part  grammars  also  would  have  nodes  with  the 
same  label  and  differing  numbers  of  children.  A good  example  of  this  is  lists  of  objects. 

Tai’s  algorithm  has  the  most  promise.  It  is  general,  so  that  it  will  produce  differences  for  parse  trees. 
However,  it  may  not  produce  the  required  information.  Because  changes  can  be  adjacent,  changes  that 
should  be  divided  may  still  appear  as  one  difference.  Alternately,  changes  might  be  reported  at  a lower 
level  than  the  person  viewing  the  differences  would  want.  This  algorithm  is  also  too  general.  Changes 
made  to  a parse  tree  are  more  restricted  than  deleting  or  inserting  any  node.  It  should  be  possible  to  dev- 
ise an  algorithm  specific  to  the  type  of  changes  that  occur  for  parse  trees  and  that  would  be  faster. 

4.  Experience 

The  SAGA  editor  has  a simple  system  which  generates  differences  between  versions  of  a program. 
The  user  begins  by  telling  the  system  to  use  the  version  he  or  she  is  currently  editing  as  the  base  version. 
All  differences  will  then  be  shown  relative  to  this  base  version  until  the  user  sets  another  version  to  be  the 
base. 

As  the  user  edits  the  program,  the  editor  records  where  changes  are  made  by  setting  a field  in  the 
terminal  nodes.  The  modified  fields  are  set  in  nodes  which  are  inserted  and  in  nodes  whose  neighbors  are 
deleted.  The  difference  system  uses  the  modified  fields  to  locate  the  changed  sections  of  the  program.  The 
system  saves  the  information  about  the  differences  and  reuses  it  if  the  user  asks  to  see  the  differences  again 
before  he  or  she  makes  additional  changes  to  the  program. 
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The  display  of  the  differences  is  screen-oriented.  The  differences  are  displayed  one  at  a time.  If  the 
text  does  not  fit  on  one  screen,  the  user  may  scroll  it  up  or  down.  The  screen  is  divided  between  the  part 
of  the  difference  that  shows  what  the  program  currently  has  (the  new  part)  and  what  it  had  when  the  base 
was  set  (the  old  part).  These  parts  can  be  scrolled  independently  or  together.  The  system  also  highlights 
the  tokens  which  have  changed  (as  opposed  to  the  context  which  the  user  requested  around  the  change). 

The  difference  system  also  includes  the  potential  for  an  undo  command.  The  user  can  select  a 
difference  to  undo.  The  system  will  produce  a script  which  will  delete  the  text  in  the  new  part  of  the 
difference  and  reinsert  the  text  that  had  been  in  the  program  (the  old  part  of  the  difference)  for  the  editor 
to  execute. 

6.  Proposal 

Many  of  the  features  of  a good  difference  system  would  be  straight-forward  to  implement.  Either 
similar  features  exist  in  other  types  of  tools  or  methods  for  gathering  and  using  the  necessary  information 
are  clear. 

Other  features  are  not  as  easy.  I want  to  concentrate  on  two  of  these:  dividing  differences  into  logi- 
cal sections  and  summarizing  differences.  Three  problems  related  to  these  are  determining  the  conditions 
that  the  sets  of  nonterminals  for  dividing  differences  and  for  summarizing  differences  must  meet,  devising  a 
scheme  for  storing  the  methods  to  find  names  for  the  summarized  sections,  and  determining  criteria  for 
deciding  at  what  level  summaries  of  differences  should  be  made. 

Many  programming  environments  now  include  program  editors  which  keep  the  parse  tree  or  abstract 
syntax  tree  for  the  program.  With  a tree  representation  available,  it  should  be  possible  to  use  the  struc- 
ture of  the  trees  to  divide  contiguous  sections  which  have  changed  and  which  would  normally  be  shown  as 
one  section,  into  several,  more  reasonable,  sections.  This  problem  can  be  divided  into  four  parts.  These 
are  limiting  the  subtrees  that  must  be  compared,  eliminating  some  subtrees  of  those  subtrees  from  con- 
sideration, finding  the  differences,  and  displaying  the  differences. 
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Since  it  seems  that  tree  comparison  schemes  general  enough  to  use  for  changes  in  parse  trees  are 
expensive,  using  one  to  compare  entire  trees  is  impractical.  Given  the  nature  of  parse  trees,  a change  in 
the  tree  will  always  include  a change  in  the  leaves.  Also,  for  parse  trees,  looking  at  just  the  leaves  is  mean- 
ingful. Thus  for  parse  trees  it  is  possible  to  find  the  differences  in  the  leaves,  considered  as  strings  of 
tokens,  and  to  use  this  information  to  find  subtrees  which  contain  changes. 

The  costs  of  the  tree  comparisons  depend  on  the  number  of  nodes  in  the  trees  or  the  heights  of  the 
trees.  The  subtrees  compared  should  be  as  small  as  possible,  while  being  large  enough  to  produce  useful 
information.  What  subtrees  are  compared  can  be  based  on  the  string  difference  between  the  terminals  of 
the  tree.  For  each  different  section,  the  subtrees  in  the  new  and  old  trees  that  correspond  to  the  change  in 
the  terminals  can  be  found.  Since  the  idea  is  to  present  differences  in  logical  sections,  treating  each 
changed  section  of  the  terminal  lists  separately  seems  reasonable. 

Several  methods  can  be  used  to  find  subtrees  for  a changed  section  of  the  terminal  lists.  A simple 
approach  would  be  to  find  the  smallest  subtree  which  contains  all  the  terminals  that  have  changed,  and  to 
do  this  in  both  the  new  and  old  trees.  A problem  with  this  approach  is  that  the  subtree  will  not  neces- 
sarily contain  all  the  changes  in  the  tree  structure  caused  by  the  change  in  the  terminal  list.  For  example, 
with  an  LR(l)  parser,  the  extent  of  the  effect  of  the  change  to  the  tree  to  the  left  of  the  change  in  the  ter- 
minals is  limited,  but  the  effect  to  the  right  is  not.  To  inform  the  user  of  all  the  ramifications  of  the 
change,  a subtree  which  contains  all  the  changed  terminals  as  well  as  all  parts  of  the  tree  that  changed 
because  of  them  should  be  included  in  the  subtree. 

One  way  to  accomplish  this  would  be  to  find  the  subtree  based  on  an  incremental  parsing  algorithm. 
The  incremental  parsing  algorithm  can  find  a subtree  that  contains  all  the  changes  to  the  tree  caused  by  a 
change  in  the  terminal  list.  For  the  Ghezzi  and  Mandrioli  algorithm  [Ghezzi  and  Mandrioli,  1980],  the 
subtrees  in  both  the  new  and  old  trees  can  be  found  since  the  nonterminal  at  which  the  algorithm  stops 
matches  in  the  trees.  This  has  the  added  advantage  of  finding  two  subtrees  which  have  the  same  root  to 
compare.  This  is  not  essential,  but  is  assumed  by  some  tree  comparison  algorithms,  for  example  Tai’s. 
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The  subtrees  found  by  an  incremental  parsing  algorithm  have  another  advantage.  Changes  which 
are  related  but  which  are  not  contiguous  will  be  grouped  together  into  one  subtree.  Changing  a while  to  a 
repeat,  for  example,  requires  changes  that  can  be  widely  separated,  but  the  subtree  containing  all  the 
changes  associated  with  changing  while  to  repeat  and  deleting  the  condition  will  include  the  change  to 
insert  until  and  a condition  at  the  end  of  the  loop.  Grouping  related  changes  would  present  differences 
more  reasonably.  A problem  arising  from  this  is  how  to  recognize  unrelated  changes  that  also  appear  in 
the  subtrees  and  how  to  deal  with  them.  If  the  statement  of  a while  that  changed  to  a repeat  also  changed, 
the  changes  to  the  statement  would  be  included  in  the  subtrees  for  the  while , but  would  not  be  related. 
Another  problem  arises  if  a subtree  chosen  for  one  change  includes  a previous  change  which  has  already 
been  grouped  and  matched.  Some  way  to  deal  with  this  would  have  to  be  developed.  The  subtrees  to 
compare  obtained  from  an  incremental  parsing  algorithm  have  some  very  nice  properties,  but  also  have  a 
potential  problem  in  finding  useful  information.  The  subtrees’  roots  may  be  a nonterminal  that  is  mean- 
ingless to  the  user.  A question  is  whether  this  matters. 

The  grammar  used  by  the  LR  parser  will  contain  nonterminals  which  exist  solely  to  make  the 
language  easier  to  parse.  A good  example  of  this  is  nonterminals  and  production  rules  added  to  produce 
an  unambiguous  grammar.  The  user  will  not  care  about  seeing  differences  based  on  all  the  nonterminals  of 
the  grammar.  Even  if  all  the  nonterminals  represented  unique  entities,  the  user  would  not  want  to  see 
differences  based  on  all  of  them.  That  would  provide  differences  on  too  fine  of  a scale.  Thus  the  informa- 
tion shown  to  the  user  should  be  based  on  some  subset  of  the  nonterminals  of  the  grammar  in  which  the 
user  will  be  interested. 

If  having  the  roots  of  the  subtrees  be  “interesting”  nonterminals  is  important,  such  subtrees  could  be 
obtained  in  several  ways.  One  possibility  is  to  find  the  smallest  subtree  which  contains  all  the  changed  sec- 
tion of  the  terminals  and  whose  root  is  an  interesting  nonterminal,  and  to  do  this  in  both  the  new  and  old 
trees.  This  would  no  longer  guarantee  that  all  parts  of  the  trees  affected  by  the  change  in  terminals  were 
included  in  the  subtrees.  However,  since  the  purpose  of  the  comparison  is  to  display  changes  to  the  user  in 
a logical  fashion,  and  not  to  record  changes  to  the  parse  tree  per  se,  this  may  not  matter.  The  advantage 
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of  grouping  related  changes  into  one  subtree  would  be  obtained  with  this  method  as  well.  All  the  changes 
to  an  entity  in  which  the  user  is  interested  would  be  in  the  subtrees.  As  with  finding  the  subtrees  based  on 
an  incremental  parsing  algorithm,  this  method  could  find  a subtree  which  would  include  a change  before 
this  one  which  has  already  been  grouped  and  matched  or  a change  between  two  related  changes. 

One  advantage  selecting  the  subtrees  based  on  an  incremental  parsing  algorithm  has  over  selecting 
based  on  interesting  nonterminals  is  that  the  roots  of  the  subtrees  will  be  the  same.  This  is  not  essential 
even  for  Tai’s  algorithm,  since  artificial  matching  roots  can  be  added  to  the  subtrees.  However,  it  may  be 
desirable.  If  so,  in  both  the  new  and  old  trees,  the  smallest  subtree  which  contains  the  changed  terminal 
section  and  whose  root  is  an  interesting  nonterminal  which  matches  the  root  of  the  subtree  from  the  other 
tree  could  be  chosen.  This  poses  some  problems.  Let  Nn  be  the  root  of  the  smallest  subtree  containing  all 
the  changed  terminals  in  the  new  tree,  and  NQ  be  defined  similarly  for  the  old  tree.  Let  nN  be  the  number 
of  ancestors  of  Nn  which  are  interesting  nonterminals,  and  nQ  be  defined  likewise  for  NQ . Then  in  the 
worst  case  finding  matching  ancestors  would  take  time  of  Another  problem  is  choosing  between 

multiple  matches.  If  Nn  matches  NQ9  MN  matches  MQ,  MN  is  an  ancestor  of  and  MQ  is  an  ancestor  of 
N0j  the  choice  is  clear.  But  if  instead  Nn  is  an  ancestor  of  Af^,  the  choice  is  not  obvious.  Some  criteria  for 
choosing  would  have  to  be  developed. 

Another  way  to  get  roots  for  the  new  and  old  subtrees  so  that  they  are  interesting  nonterminals 
would  be  to  combine  finding  subtrees  using  an  incremental  parsing  algorithm  and  finding  nodes  that  are 
interesting  nonterminals.  The  incremental  parsing  algorithm  could  be  used  to  find  the  new  and  old  sub- 
trees which  contain  all  the  changes  to  the  parse  tree  caused  by  the  changed  section  of  the  terminal  list. 
Then  in  both  the  new  and  old  trees,  the  first  ancestor  of  the  root  of  this  subtree  which  is  an  interesting 
nonterminal  could  be  found  and  taken  as  the  root  of  the  subtree  for  comparison.  This  combines  most  of 
the  advantages  of  the  two  methods. 

One  advantage  not  achieved  by  combining  the  methods  is  that  of  obtaining  subtrees  for  comparison 
which  have  the  same  root.  As  when  subtrees  were  found  based  solely  on  interesting  nonterminals,  this 
could  be  remedied  by  finding  in  both  trees,  the  smallest  subtree  which  contains  the  subtree  found  based  on 
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the  incremental  parsing  algorithm  and  whose  root  is  the  same  interesting  nonterminal  as  that  for  the  sub- 
tree in  the  other  tree.  This  of  course  has  the  same  problems  as  before. 

The  discussion  so  far  has  mentioned  potential  advantages  of  various  methods,  but  no  disadvantages. 
Aside  from  not  possessing  all  the  advantages  of  another  method,  the  only  area  in  which  disadvantages 
arise  seems  to  be  the  size  of  the  subtrees  obtained.  The  purpose  of  restricting  the  subtrees  for  comparison 
is  to  decrease  amount  of  space  and  time  required.  Since  some  of  these  techniques  for  finding  the  subtrees 
get  larger  subtrees,  they  do  not  accomplish  the  major  goal  as  well  as  other  techniques.  Trying  to  get 
interesting  nonterminals  that  match  or  an  interesting  nonterminal  whose  subtree  contains  the  subtree 
based  on  an  incremental  parsing  algorithm  will  necessarily  find  larger  subtrees  than  some  of  the  other 
methods. 

Many  methods  can  be  employed  to  limit  the  subtrees  that  must  be  compared.  Which  is  best  depends 
on  several  factors.  First,  various  methods  can  be  employed  in  subsequent  steps.  One  method  for  limiting 
the  subtrees  might  work  best  for  one  method  of  finding  the  differences,  while  another  might  work  best  for 
another  method  of  finding  the  differences.  Another  factor  might  be  the  particular  grammar  used  for  the 
parse  tree.  If  a situation  in  which  one  method  performs  better  than  another  never  or  rarely  arises  with  a 
particular  grammar,  the  one  producing  the  smaller  trees  for  comparison  would  be  better.  A third  factor  is 
the  set  of  nonterminals  that  make  up  the  interesting  nonterminals.  Some  requirements  must  be  imposed 
on  the  set.  What  these  requirements  are  will  affect  the  outcome  of  the  limiting  process.  Further,  given  a 
set  of  requirements,  different  sets  satisfying  the  requirements  may  cause  one  method  to  perform  better 
than  another.  Finally  the  expectations  of  the  user  of  the  system  will  affect  the  choices.  A user  willing  to 
accept  occasional  odd  results  from  a faster  system  will  prefer  a different  method  than  a user  who  demands 
perfection  no  matter  what  the  cost.  Comparing  these  methods  to  see  which  result  in  the  smallest  trees,  the 
fastest  comparisons,  the  fewest  odd  results,  and  the  most  information  will  be  interesting. 

The  next  step  in  dividing  differences  into  logical  sections  is  to  eliminate  subtrees  from  the  subtrees 
found  in  the  first  step.  For  clarity,  call  the  subtrees  that  are  selected  for  comparison  the  trees,  so  that  sub- 
trees will  refer  to  the  subtrees  of  these  trees  that  are  to  be  eliminated  from  consideration  in  finding 
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differences. 

Several  reasons  for  eliminating  subtrees  exist.  One  reason  is  similar  to  the  reason  for  limiting 
subtrees — the  time  and  space  required  to  find  the  differences  should  be  less  since  the  trees  to  compare  will 
have  fewer  nodes.  Another  benefit  of  eliminating  subtrees  is  better  results  from  the  tree  comparison.  If 
some  section  of  terminals  is  unchanged  and  the  trees  above  them  match,  the  sections  should  probably 
match,  so  they  should  be  reported  in  that  way.  Depending  on  the  tree  comparison  algorithm  and  the  tree 
structure,  these  sections  may  or  may  not  be  reported  as  changed.  A final  advantage  of  eliminating  sub- 
trees is  that  it  might  make  more  of  the  tree  comparison  algorithms  applicable.  For  example,  Selkow’s 
algorithm  could  be  used  but  it  will  report  sections  that  have  not  changed  as  changed  (this  will  be  explained 
when  the  third  step,  comparing  the  trees,  is  examined).  This  would  probably  make  the  algorithm  unusable 
unless  eliminating  subtrees  can  match  enough  unchanged  subtrees  so  that  Selkow’s  algorithm  reports  few 
spurious  changes. 

Several  methods  could  be  employed  to  eliminate  subtrees.  One  could  start  from  the  leaves  and  go  up 
as  long  as  the  trees  were  the  same.  This  would  start  from  unchanged  sections  of  the  terminal  list.  These 
could  be  in  the  trees  to  compare  because  of  the  grouping  of  related  changes.  An  example  of  this  is  the 
statement  of  the  while  if  a while  is  changed  to  a repeat  Starting  from  the  terminals,  the  trees  above  could 
be  compared  until  the  trees  are  different.  Parts  of  the  unchanged  sections  on  the  left  and  right  edges  may 
be  dropped  from  the  subtree  as  parts  of  the  tree  on  the  left  and  right  do  not  match  or  include  parts  of  the 
changed  section  of  the  terminal  lists.  The  subtrees  obtained  should  be  the  largest  subtrees  in  the  trees  that 
contain  only  unchanged  terminals  as  leaves  and  which  are  the  same.  These  matching  subtrees  could  be 
found  for  each  section  of  unchanged  terminals  included  in  the  trees. 

Another  possibility  is  to  find  a series  of  trees  that  match  rather  than  just  one  for  each  unchanged  sec- 
tion of  terminals.  This  would  presumably  remove  more  of  the  tree  from  the  part  that  must  be  compared, 
which  should  make  the  comparison  faster.  The  larger  number  of  sections  that  are  already  matched  with 
which  the  tree  comparison  would  have  to  deal  might  mitigate  this.  The  subtrees  that  are  matched  might 
be  small.  This  would  tend  to  make  bookkeeping  more  expensive  for  little  benefit.  It  would  also  tend  to 
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make  reporting  the  differences  to  the  user  more  complex.  The  user  would  see  many  small  matches  in  a 
difference,  which  could  serve  to  obscure  the  real  change. 

As  with  limiting  the  trees  to  be  compared,  a question  with  eliminating  subtrees  is  whether  the  root  of 
the  subtree  should  be  a nonterminal  that  is  interesting  to  the  user.  The  reporting  of  the  change  would 
probably  be  more  meaningful  if  the  report  mentions  the  unchanged  sections.  The  subtrees  matched  would 
be  the  largest  subtrees  which  contain  only  unchanged  terminals,  which  match,  and  whose  roots  are 
interesting  nonterminals,  or  a series  of  such  subtrees.  The  matching  subtrees  would  be  smaller  if  an 
interesting  nonterminal  must  be  the  root.  For  a series  of  matched  subtrees  requiring  the  roots  to  be 
interesting  nonterminals  might  be  better.  Depending  upon  the  set  of  interesting  nonterminals,  the  size  of 
the  subtrees  which  are  matched  would  be  reasonable.  The  worry  about  too  much  overhead  for  too  little 
benefit  and  a confusing  display  for  the  user  could  disappear. 

Another  possibility  for  eliminating  matching  subtrees  is  based  on  incremental  parsing.  Ghezzi  and 
Mandrioli’s  incremental  LR(0)  parser  contains  a section  which  will  reuse  parts  of  the  tree  to  the  right  of 
the  change.  .The  subtrees  that  are  reused  would  contain,  at  least  in  part,  trees  that  are  the  same.  One 
problem  with  the  trees  that  are  reused  is  that  parts  of  the  tree  can  contain  changed  terminals.  This  part 
of  the  reused  subtree  could  be  avoided  by  taking  the  largest  subtree  that  is  reused  but  does  not  contain 
any  changed  terminals.  A series  of  such  subtrees  could  also  be  found  with  this  method. 

One  question  with  finding  the  trees  for  comparison  which  does  not  arise  with  subtrees  for  elimination 
is  whether  the  roots  of  the  subtrees  should  match.  Because  the  entire  subtrees  match,  this  does  not  hap- 
pen. 

An  issue  to  examine  for  eliminating  subtrees  is  how  it  is  affected  by  the  method  used  in  the  previous 
step  and  how  it  affects  subsequent  steps.  It  should  not  be  affected  by  what  are  selected  as  roots  of  the  trees 
to  compare,  that  is,  by  what  method  is  used  to  find  the  trees.  Using  similar  methods  in  both  steps  might 
produce  better,  more  internally  consistent  results.  For  example,  if  the  roots  for  the  trees  to  compare  are 
selected  to  be  interesting  nonterminals,  the  roots  of  the  subtrees  to  eliminate  could  be  selected  to  be 


interesting  nonterminals. 
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The  method  used  to  find  the  subtrees  to  eliminate  will  affect  the  subsequent  steps.  If  subtrees  are 
eliminated,  then  the  methods  finding  the  differences  must  handle  subtrees  that  have  already  been  matched. 

If  a series  of  matched  subtrees  are  eliminated,  the  comparisons  must  account  for  that.  The  comparison 
method  used  must  change  depending  upon  what  is  done  in  the  elimination  step.  Another  affect  of  the  elim- 
ination step  could  be  to  make  the  results  of  the  tree  comparison  better. 

Eliminating  subtrees  could  also  affect  how  the  differences  are  displayed.  If  subtrees  are  eliminated, 
either  the  text  of  the  eliminated  subtrees  or  a short  representation  of  the  matched  subtrees  can  be 
displayed.  Also  to  be  decided  is  whether  subtrees  that  are  eliminated  from  the  comparison  should  be 
treated  differently  from  parts  of  the  trees  that  the  comparison  says  are  the  same. 

Eliminating  subtrees  can  be  done  in  several  ways.  I want  to  try  the  methods  for  this  in  combination 
with  the  methods  for  selecting  trees  for  comparison  to  see  which  produce  the  best  results.  Another  possi- 
bility to  compare  is  not  eliminating  subtrees  at  all.  This  will  help  show  whether  effort  on  that  is  really 
useful. 

The  third  step  in  dividing  the  differences  into  logical  sections  is  the  actual  comparison  of  the  trees. 
Many  possibilities  exist  for  this  step.  Now  examine  how  three  of  the  existing  algorithms  might  perform  in 
grouping  and  separating  differences  if  elimination  of  subtrees  is  not  done,  then  at  what  implications  the 
elimination  of  subtrees  has  for  these  algorithms,  what  other  possibilities  for  comparing  the  parse  trees 
exist,  and,  finally,  what  forms  of  trees  might  be  useful  for  the  tree  comparisons. 

The  results  the  tree  comparison  algorithms  produce  should  be  able  to  group  related  differences  and 
separate  unrelated  differences.  Another  problem  with  which  the  algorithms  would  have  to  deal  is  a 
changed  section  that  contains  parts  that  need  to  be  grouped  and  parts  that  need  to  be  separated.  This 
might  present  more  difficulties  for  the  algorithms  and  should  be  considered  in  evaluating  which  is  best.  It 
does  not  seem  to  add  enough  to  the  simple  analysis  here  to  be  considered  now. 

One  of  the  existing  algorithms  is  that  developed  by  Selkow.  This  algorithm  allows  insertions  and 
deletions  only  at  leaves.  For  grouping  changes  to  the  same  structure,  this  algorithm  will  work  well.  If  an 
if  is  changed  to  a while , for  example,  the  algorithm  will  report  that  the  entire  subtree  constituting  the  if 
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must  be  deleted  and  the  entire  subtree  for  the  while  inserted.  This  is  a result  of  the  requirement  that  only 
leaves  may  change.  The  tree  will  have  a node  that  indicates  an  if.  This  will  have  to  change  to  one  that 
indicates  a while . For  that  node  to  change,  it  must  become  a leaf,  so  all  its  children  must  be  deleted,  the 
node  changed,  then  all  the  children  of  the  new  node  inserted.  If  a few  more  internal  nodes  that  are  present 
to  make  the  grammar  more  amenable  to  parsing  also  change,  they  will  also  need  to  be  inserted  or  deleted, 
but  the  whole  change  from  an  if  to  a while  would  still  be  one  group  of  nodes  to  delete  and  one  group  to 
insert.  Thus  the  related  changes  would  be  grouped  together.  One  problem  with  this  is  that  the  fact  that 
the  condition  and  statement  have  not  changed  is  not  detected.  The  user  would  have  to  scan  the  text, 
which  could  be  a considerable  amount,  to  determine  whether  anything  had  changed  beside  the  if  to  while. 

Selkow’s  algorithm  may  or  may  not  separate  unrelated  changes,  depending  upon  the  tree  structure. 
If  the  levels  of  nodes  in  changed  structures  do  not  change,  then  Selkow’s  algorithm  will  match  those  nodes. 
Then  the  changes  to  the  contiguous  structures  can  come  out  as  separate  changes  to  the  tree.  However,  if 
the  levels  of  nodes  common  to  both  the  new  and  old  trees  change,  the  algorithm  will  say  to  delete  and 
insert  everything.  The  separate  changes  would  come  out  looking  like  one  change. 

The  second  tree  comparison  algorithm  is  Tai’s.  It  allows  insertions  and  deletions  anywhere  in  the 
tree.  For  grouping  differences  this  would  not  work  well.  In  the  if  to  while  example,  Tai’s  algorithm  would 
report  three  changes:  deleting  if  and  inserting  while , deleting  then  and  inserting  do,  and  deleting  and 
inserting  the  nonterminals  that  indicate  an  if  and  a while.  These  would  all  be  reported  as  separate 
changes,  so  Tai’s  algorithm  does  not  help  to  group  related  changes.  It  would  report  the  condition  and 
statement  as  unchanged,  which  is  an  advantage  over  Selkow’s  algorithm. 

Tai’s  algorithm  should  perform  better  at  separating  differences.  Since  it  can  match  nodes  at  any 
level,  it  would  not  be  affected  in  the  way  Selkow’s  algorithm  is  by  changes  that  affect  the  level  of 
unchanged  nodes.  If  two  changes  are  contiguous  but  unrelated,  the  unchanged  sections  should  match, 
which  will  put  changes  to  the  separate  structures  into  separate  differences.  If  the  changes  are  to  some  unit 
which  is  meaningless  to  the  user,  more  needs  to  be  done  to  translate  the  changes  into  a change  the  user 


would  understand. 
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The  third  tree  comparison  algorithm  that  could  be  used  is  Tichy’s.  This  algorithm  assumes  that  all 
nodes  with  the  same  label  have  the  same  number  of  children.  It  then  puts  the  trees  into  a linear  represen- 
tation, such  as  the  preorder  traversal,  and  uses  a string  comparison  algorithm  on  the  linear  representation. 
If  this  algorithm  were  appropriate  for  parse  trees,  its  results  would  be  similar  to  those  from  Tai’s  algo- 
rithm. Nodes  would  match  no  matter  what  the  level.  Changes  would  not  be  grouped  together.  They 
would  be  separated,  but  not  necessarily  into  units  that  that  user  would  understand. 

A question  is  whether  the  assumption  that  all  nodes  with  the  same  label  have  the  same  number  of 
children  is  appropriate  for  parse  trees.  The  most  useful  labels  for  parse  tree  nodes  are  the  terminals  and 
nonterminals  that  they  represent.  Nodes  that  represent  the  same  nonterminal  can  have  different  numbers 
of  children  because  a nonterminal  can  be  on  the  left  hand  side  of  many  production  rules,  which  could  have 
right  hand  sides  of  various  lengths.  A possible  solution  could  be  to  use  the  rule  number  as  the  label  of  the 
node  rather  than  the  nonterminal,  if  this  information  is  available  in  addition  to  or  instead  of  the  nontermi- 
nal. It  is  likely  that  the  user  would  be  interested  in  changes  in  nonterminals,  not  rule  numbers.  This 
problem  might  be  alleviated  by  using  rule  numbers  for  the  initial  comparison  then  doing  further  matching 
on  nonterminals.  Using  rule  numbers  would  not  help  at  all  if  the  parse  trees  were  from  a regular  right 
part  grammar. 

A solution  similar  to  using  rule  numbers  would  be  to  make  the  unit  of  comparison  a nonterminal  and 
number  of  children,  rather  than  just  the  nonterminal  or  rule  number.  This  would  also  need  further  com- 
parisons. Unlike  using  rule  numbers,  it  would  be  applicable  to  regular  right  part  grammars. 

Another  possible  solution  is  to  include  a special  mark  element  after  all  rightmost  children,  then  treat 
these  elements  just  like  the  string  comparison  elements  that  are  terminals  and  nonterminals.  This  could 
lead  to  some  strange  matches.  As  long  as  these  happened  rarely,  the  strange  matches  could  be  tolerable. 

A possible  way  to  avoid  strange  matches  would  be  to  restrict  matches  that  cross  mark  elements  and 
not  to  treat  the  mark  elements  as  normal  string  elements.  The  algorithm  would  have  to  change  to  handle 
this.  A problem  could  be  that  this  requirement  would  be  as  restrictive  as  Selkow’s  algorithm  or  more  so. 
It  could  require  nodes  to  be  at  the  same  level  and  be  in  the  same  numerical  position  in  the  list  of  children 
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to  match. 

If  subtrees  are  eliminated,  then  the  tree  comparison  algorithms  would  have  to  account  for  this.  At 
this  point  the  concept  of  a trace  is  useful.  For  string  comparisons,  a trace  matches  the  elements  that  are 
unchanged.  The  trace  would  be  a set  of  ordered  pairs  giving  the  positions  in  the  strings  that  are  matched. 
For  strings  Saturday  and  Sunday,  a trace  would  be 

Saturday 

i //// 

Sunday 

In  a trace,  none  of  the  lines  can  cross,  that  is,  if  (i,  j)  and  (m,  n)  are  in  the  trace,  then  i < m iff  ; < n. 
Thus 


would  not  be  a trace.  Traces  for  trees  can  also  be  defined.  For  trees,  if  the  nodes  are  listed  in  preorder,  no 
lines  would  cross. 


The  tree  comparison  algorithms  restrict  themselves  to  differences  that  produce  a trace.  If  this  is  to 
be  true  when  parts  of  the  trees  are  matched  before  the  comparison  is  done,  then  the  tree  comparison  algo- 
rithms must  be  changed.  This  might  be  just  another  dynamic  programming  problem.  For  Tichy’s  algo- 
rithm, which  uses  a string  comparison,  this  is  no  problem.  If  only  one  pair  of  subtrees  is  matched,  the 
strings  would  just  be  divided  into  two  pairs  of  strings  to  compare.  If  more  than  one  pair  is  matched,  the 
string  is  divided  into  more  pairs  that  are  compared  without  regard  to  other  pairs.  This  is  not  as  simple  a 
problem  for  the  other  two  tree  comparison  algorithms. 


Another  possibility  is  to  simply  eliminate  the  matched  subtrees  and  not  worry  about  lines  in  the 
trace  that  cross.  This  might  help  locate  parts  of  the  tree  that  moved,  but  only  in  a limited  way.  It  would 
also  make  reporting  of  the  changes  to  the  user  more  complicated.  This  solution  for  handling  matched  sub- 
trees would  be  interesting  to  compare  with  other  methods. 
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Matching  subtrees  could  help  the  tree  comparisons  be  more  informative.  It  would  help  Selkow’s 
algorithm  with  grouping  differences,  as  mentioned  before.  It  might  help  Tai’s  and  Tichy  s algorithms  as 
well.  By  removing  subtrees  that  might  separate  related  changes,  these  changes  may  become  one  difference 
to  these  algorithms.  This  is  not  necessarily  true,  however.  Depending  upon  how  trees  are  selected,  say  the 
root  must  be  an  interesting  nonterminal,  or  only  one  pair  of  subtrees  is  matched  so  unchanged  parts  out- 
side this  subtree  are  left  to  the  tree  comparison  algorithm,  unchanged  parts  of  the  tree  could  be  left 
separating  the  related  changes.  Then  Tai’s  and  Tichy’s  algorithms  would  still  report  the  related  changes 
separately. 

Eliminating  subtrees  might  hinder  separating  unrelated  differences  in  much  the  same  way  that  it 
would  help  group  related  differences.  If  all  the  matching  nodes  between  two  unrelated  changes  are  elim- 
inated, the  changes  could  become  one  difference  again. 

Besides  the  tree  comparison  algorithms  and  modifications  to  those,  several  other  methods  for  com- 
paring trees  are  possible.  These  include  developing  a tree  comparison  algorithm  for  parse  trees,  rather 
than  trees  in  general,  using  just  the  string  comparison  on  the  terminals  in  conjunction  with  the  location  of 
interesting  nonterminals,  using  a string  comparison  on  strings  of  interesting  nonterminals,  and  not  doing 
anything  except  possibly  eliminating  matching  subtrees. 

A tree  comparison  algorithm  for  parse  trees  might  be  better  than  algorithms  that  are  for  any  tree. 
Selkow’s  and  Tichy’s  algorithms  have  some  problems  because  their  assumptions  are  not  applicable  to  parse 
trees.  Tai’s  algorithm  uses  much  time  and  space.  An  algorithm  designed  for  changes  to  parse  trees  might 
perform  better.  However,  it  might  have  the  same  problems  as  Tai’s  algorithm  in  grouping  and  separating 
differences.  Also,  such  an  algorithm  might  be  too  dependent  on  the  grammar  to  be  useful  in  general. 

Basing  the  differences  on  the  string  differences  of  the  tokens  and  the  tree  structure,  without  any  type 
of  tree  comparison,  might  produce  reasonable  results.  The  tree  comparison  algorithms  do  not  seem  to  pro- 
duce the  results  needed  to  group  and  separate  differences.  Once  the  tree  comparison  is  done,  further  pro- 
cessing is  needed  to  produce  something  which  will  group  and  separate  differences  and  be  meaningful  to  the 
user.  A reasonable  question  is  whether  the  added  knowledge  of  what  tree  structure  changed  provides  infor- 
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mation  that  would  help  with  this  and  that  the  string  difference  of  the  tokens  would  not  give.  If  it  does 
give  some  useful  information,  unless  the  resulting  displays  for  the  user  are  significantly  better  than  those 
produced  from  the  string  difference  and  tree  structure  alone,  the  tree  comparisons  may  not  be  worth  the 
added  expense. 

Another  possibility  which  uses  more  information  from  the  tree  than  comparing  the  strings  is  to  get 
strings  of  interesting  nonterminals  based  on  the  string  difference  of  the  terminals  and  use  a string  com- 
parison algorithm  on  that.  A possible  method  for  getting  a list  of  interesting  nonterminals  for  a changed 
section  of  terminals  is  to  get  the  root  of  the  smallest  subtree  that  contains  the  first  changed  terminal  and 
whose  root  is  an  interesting  nonterminal.  Eliminate  the  terminals  in  that  subtree  and  find  the  interesting 
nonterminal  for  the  reduced  list  of  terminals.  Continue  until  no  more  changed  terminals  for  this  section 
remain.  Get  the  string  of  interesting  nonterminals  for  both  the  new  and  old  trees.  Apply  a string  com- 
parison algorithm  on  these.  Basing  the  cost  of  changing  one  interesting  nonterminal  into  another  on  the 
terminals  in  the  trees  rooted  at  the  interesting  nonterminals  or  the  trees’  structure  may  be  worthwhile. 
This  is  similar  to  the  algorithm  for  finding  differences  between  screen  displays  which  Gosling  presents  [Gos- 
ling, 1981]. 

One  other  possibility  is  to  find  the  trees  to  compare,  eliminate  whatever  subtrees  should  be  elim- 
inated, and  call  that  the  difference.  This  would  work  well  for  grouping  differences,  but  not  at  all  for 
separating  them.  It  would  be  interesting  to  compare  this  to  other  methods. 

In  addition  to  the  method  to  use  to  compare  the  parse  trees  and  generate  the  differences,  a considera- 
tion is  what  form  of  the  trees  to  compare.  Some  possibilities  are  to  treat  lists  of  items  in  the  grammar 
differently  from  other  tree  structures,  use  only  interesting  nonterminals  for  the  comparison,  or  use  abstract 
syntax  trees. 

Many  languages  have  lists  of  elements,  such  as  lists  of  statements  or  lists  of  declarations.  They  will 
usually  be  represented  in  the  grammar  by  rules  like 

item  ::=  ... 

list-of-items  e ] item  list-of-items 
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or 

list-of-items  ::=  item  ] item  list-of-items 
or 

list-of-items  e \ list-of-items  item 

or 

list-of-items  item  ] list-of-items  item 

depending  upon  whether  the  list  can  be  empty  and  whether  the  production  rule  is  left-  or  right-recursive. 
The  trees  produced  by  these  rules  will  be  narrow  and  tall.  If  an  item  changes,  the  trees  chosen  as  contain- 
ing the  change  for  the  comparison  will  contain  all  the  items  before  or  after  the  one  that  was  changed, 
inserted,  or  deleted,  depending  upon  whether  the  grammar  is  left-  or  right-recursive.  This  list  could  be 
quite  long.  Depending  upon  how  well  subtree  elimination  worked,  the  trees  to  be  compared  could  be  quite 
large.  If  several  items  in  the  list  were  changed,  inserted,  or  deleted,  the  entire  list  or  a large  part  of  it 
would  need  to  be  compared.  The  question  is  whether  it  is  better  to  recognize  that  the  elements  that  the 
comparison  must  deal  with  is  a list  of  items  and  then  compare  them  as  lists  or  to  ignore  the  special  nature 
of  the  trees  and  compare  them  as  trees.  Using  a comparison  algorithm  for  strings  would  allow  something 
like  the  method  Gosling  suggests  for  comparing  display  screens,  that  is,  using  a comparison  to  find  the  cost 
of  converting  an  element  of  one  list  to  an  element  of  the  other.  Treating  tree  structures  that  represent 
lists  as  lists  rather  than  trees  could  produce  a better  comparison  or  produce  a comparison  more  efficiently. 

Another  possibility  for  comparing  the  parse  trees  is  to  involve  only  the  interesting  nonterminals  in 
the  comparison.  Since  the  user  will  want  to  see  differences  only  in  terms  of  the  interesting  nonterminals, 
comparing  the  tree  in  these  terms  seems  reasonable.  For  the  comparison,  all  the  nodes  in  the  trees  except 
the  interesting  nonterminals  and  the  terminals  could  be  ignored.  This  new  form  of  the  parse  tree  could 
have  all  the  terminals  and  interesting  nonterminals  treated  as  the  children  of  the  closest  ancestor  that  is  an 
interesting  nonterminal.  Looking  only  at  interesting  nonterminals  could  have  three  advantages.  Because 
the  uninteresting  nonterminals  are  not  involved  in  the  comparison,  the  tree  comparisons  would  be  han- 
dling fewer  nodes  and  would  be  faster.  Since  only  interesting  nonterminals  are  considered,  further 
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processing  after  the  comparison  to  get  the  differences  into  terms  meaningful  to  the  user  should  not  be 
necessary.  Finally,  the  tree  comparison  algorithms  might  have  fewer  problems  grouping  and  separating 
differences. 

A final  form  of  the  tree  to  consider  using  for  the  comparisons  is  abstract  syntax  trees.  With  abstract 
syntax  trees,  presumably  the  set  of  interesting  nonterminals  would  not  be  necessary.  The  tree  should  not 
contain  nonterminals  that  exist  to  simplify  parsing.  It  could  of  course  be  possible  that  the  user  would  still 
not  be  interested  in  all  the  nonterminals  used  in  the  abstract  syntax  tree.  A possible  example  is  an  item 
that  would  be  a small  amount  of  text.  If  a subscript  on  an  array  reference  changed,  the  user  might  prefer 
having  the  array  reference  reported  as  changed,  rather  than  just  the  subscript.  A problem  with  abstract 
syntax  trees  is  that  changes  in  the  tree  structure  would  not  necessarily  have  a corresponding  change  in  the 
leaves  of  the  tree.  An  example  is  that  of  changing  an  if  to  a while.  Both  would  have  children  of  expression 
and  statement.  These  would  not  change.  It  is  of  course  possible  to  include  parts  of  the  syntax  in  the 
leaves  of  the  tree,  but  it  would  seem  that  a true  abstract  syntax  tree  would  not  contain  these.  If  it  did 
not,  it  would  not  be  possible  to  locate  the  areas  where  the  tree  structure  had  changed  by  comparing  the 
leaves.  The  only  way  to  find  the  differences  in  the  trees  would  be  to  compare  the  trees  in  their  entirety. 
This  would  take  quite  a bit  of  time.  Involving  only  the  terminals  and  interesting  nonterminals  in  the  com- 
parisons probably  has  the  advantages  of  comparing  abstract  syntax  trees  without  the  disadvantages. 

The  fourth  and  final  step  in  dividing  the  differences  into  logical  units  is  to  display  the  results  to  the 
user.  This  has  not  received  much  thought  yet.  A few  points  have  been  mentioned  in  the  discussion  of  the 
other  steps.  Some  of  the  issues  to  decide  are  how  to  display  changes  which  are  physically  distant  but  logi- 
cally related,  how  to  display  the  unchanged  sections  between  related  changes,  and  whether  to  treat  sections 
that  are  matched  in  step  two  specially.  No  doubt  more  issues  will  arise  as  the  other  steps  develop  and 
what  type  of  information  can  be  obtained  from  the  comparison  step  is  seen. 

Dividing  differences  into  logical  sections  can  be  done  in  many  ways.  Many  combinations  of  methods 
for  various  steps  are  possible.  Finding  how  the  methods  behave  and  which  produce  good  results  will  be 
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The  second  feature  I want  to  develop  is  summarizing  differences.  The  summaries  would  also  be 
based  on  the  parse  tree  structure.  Algorithms  developed  for  dividing  differences  into  logical  sections  might 
serve  as  a good  base  for  summarizing  differences  into  logical  sections.  I want  to  investigate  these  possibili- 
ties. If  that  proves  fruitless,  I will  develop  a method  for  summarizing  differences  independent  from  the 
methods  for  dividing  differences. 

For  both  dividing  and  summarizing  differences,  a subset  of  the  grammar’s  nonterminals  must  be 
chosen.  If  all  the  nonterminals  were  used  for  dividing,  differences  would  be  divided  too  finely,  which  would 
be  more  confusing  than  helpful.  For  summarizing,  seeing  summaries  for  each  nonterminal  would  be  too 
time  consuming.  It  would  also  not  be  worthwhile  for  the  person  viewing  the  differences  since  the  nontermi- 
nals would  include  nonterminals  whose  purpose  was  to  simplify  parsing.  To  function  well,  the  subsets  will 
probably  have  to  satisfy  certain  conditions. 

For  dividing  differences,  it  might  be  that  all  that  is  necessary  is  a set  of  nonterminals  which  can  gen- 
erate all  the  terminals.  The  requirements  for  summarizing  will  be  more  complex.  The  summaries  should 
be  on  different  levels.  This  probably  means  that  each  level  will  have  its  own  set  of  nonterminals.  These 
sets  will  have  to  meet  certain  requirements,  each  set  individually  and  in  relation  to  the  sets  of  the  levels 
above  and  below. 

Some  characteristics  of  these  sets  of  nonterminals  seem  desirable.  The  nonterminals  in  any  one  level 
should  be  able  to  generate  all  the  terminals  in  the  language.  In  this  way,  any  change  in  the  higher  level 
tree  can  be  reported  by  reporting  on  changes  in  the  lower  level  trees.  Getting  nonterminals  that  satisfy 
this  restriction  will  not  always  be  possible.  For  example,  some  grammars  have  terminals  which  serve  as 
punctuation  to  separate  lists  as  children  of  possible  higher  level  nonterminals.  These  have  no  intermediate 
level  which  could  generate  them.  As  another  example,  a nonterminal  needed  for  a set  to  generate  the  ter- 
minals may  contain  nothing  of  interest  or  only  terminals  which  rarely  change.  For  example,  one  grammar 
includes  a nonterminal  begin_symbol  which  goes  to  the  token  begin.  Certainly,  some  difficulties  can  be 
overcome  by  manipulating  the  grammar;  however,  this  cannot  fix  all  difficulties  and  should  not  be  a 
requirement  to  use  the  difference  system.  Obtaining  sets  which  individually  can  generate  all  the  terminals 
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will  not  always  be  possible,  but  sets  which  come  close  should  be  used. 

Since  finding  sets  which  can  all  generate  all  the  terminals  is  not  always  possible,  some  way  to  deal 
with  changes  in  the  larger  tree  not  in  any  of  the  interesting  subtrees  must  be  developed.  If  the  difference 
system  has  told  the  user  that  a statement  was  inserted,  reporting  the  insertion  of  a semicolon  is  not  very 
informative.  If  the  only  change  in  a larger  subtree  is  in  part  of  the  terminals  not  generated  by  the  next 
level,  some  change  must  be  reported.  Otherwise  the  user  could  be  misled  and  also  come  to  distrust  a sys- 
tem which  reports  a difference  at  one  level  but  reports  no  difference  when  more  detail  is  requested. 

Some  method  for  dealing  with  text  which  is  not  in  the  parse  tree  must  be  devised.  An  example  of 
such  text  is  comments  in  programs.  Many  decisions  must  be  made:  how  the  system  decides  when  a com- 
ment is  in  a subtree  (is  it  in  a subtree  only  if  terminals  on  both  sides  of  it  are  in  the  subtree?),  if  the  only 
thing  that  changed  in  a subtree  was  a comment,  whether  the  subtree  should  be  reported  as  changed,  and 
whether  a different  message  should  be  used  to  report  that  the  tree  did  not  change,  but  something  attached 
to  it  did.  Whatever  strategy  is  chosen,  it  should  be  general  enough  that  the  reporting  makes  sense  for  any 
language  which  might  be  edited.with  a syntax-directed  editor  and  for  which  a parse  tree  can  be  built.  The 
strategy  must  also  make  sense  for  text  besides  comments  which  might  be  attached  to  a parse  tree  without 
being  part  of  it. 

Another  factor  to  consider  in  choosing  nonterminals  for  dividing  and  summarizing  differences  is  the 
relationship  of  the  set  of  nonterminals  for  dividing  differences  and  the  sets  for  summarizing  differences  to 
each  other.  A relationship  may  not  be  necessary,  but  it  might  make  more  sense  to  the  user  if  some  rela- 
tionship existed.  In  looking  at  conditions  the  sets  should  satisfy,  I will  also  see  what  relationships  might 
profitably  exist  between  them. 

Another  problem  to  be  solved  for  summarizing  differences  is  devising  a scheme  to  store  the  many 
methods  for  finding  names  in  the  trees.  Some  way  to  name  the  segment  that  has  changed  must  exist  so 
that  the  user  will  have  some  idea  in  which  part  of  the  program  the  change  is.  This  was  explained  in  more 
detail  previously.  I have  not  looked  at  this  extensively,  other  than  to  identify  some  of  the  kinds  of  things 
that  would  be  reasonable  names  and  which  should  fit  into  such  a scheme. 
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The  third  problem  is  to  find  a way  to  decide  what  level  of  differences  to  display  in  a summary  when 
the  user  does  not  specify  a level.  The  problem  seems  to  be  to  determine  what  relevant  information  is 
available  and  how  it  can  be  used.  Some  information  that  might  be  useful  is  the  tree  structure,  the  number 
of  levels  above  and  below  a level,  and  the  amount  of  text  that  a display  at  a certain  level  would  generate. 
Another  question  is  whether  the  decision  mechanism  could  be  parameterized  so  that  the  user  could  have 

some  control. 

These  are  the  five  problems  that  I want  to  solve:  dividing  differences,  summarizing  differences, 
choosing  nonterminals  for  dividing  and  summarizing,  storing  schemes  to  find  names  for  summarized  sec- 
tions, and  deciding  what  level  of  detail  to  display.  I want  to  design  methods  that  will  solve  these  prob- 
lems. 
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Abstract: 

The  GNU  Emacs  editor  has  been  incorporated  into  the  SAGA  Software 
Development  Environment  as  a uniform  user  interface.  The 
extensibility  and  interprocess  communication  features  of  GNU 
Emacs  are  used  to  integrate  several  separate  SAGA  utilities 
including  an  incremental  parser,  an  incremental  semantics  processor, 
and  a configuration  management  system. 


1.  Introduction. 

The  amount  of  time  users  spend  in  an  editor  is  a large  percentage  of  their  on-line 
time.  While  not  discussing  the  issue  of  interactive  vs.  batch  oriented  development 
methodolgies,  this  paper  is  concerned  with  maximizing  the  effectiveness  of  the  human- 
computer  interface  in  a software  development  environment. 

Our  motivation  was  the  need  to  integrate  several  different  utilities  with  a common 
user  interface.  A software  development  environment  consists  of  a number  of  utilities 
more  or  less  tied  together  by  some  user  interface.  In  the  standard  UNIX  system,  for 
example,  the  user  interface  is  commonly  the  shell.  The  different  utilities  are  activated 
by  calling  them  explicitly  from  the  shell,  or  implicitly  from  a script. 

1.1.  SAGA  Software  Development  Environment:  A software  development 

environment  includes  editors,  compilers,  linkers,  loaders,  debuggers.  In  development  are 
verification  systems  and  configuration  management  systems. 

In  the  remainder  of  this  paper,  we  describe  the  approach  adopted  to  provide  an 
improved  human  interface  to  EPOS  using  the  GNU  Emacs  editor.  The  editor  provides 
many  typical  features  found  in  full-screen  editors,  is  interfaced  to  raster  display  devices 
as  well  as  terminals,  is  programmable,  and  can  be  used  with  several  different  windowing 
system  packages  including  the  MIT  X- Windows  system.  Finally,  the  GNU  Emacs  editor 
provides  a general  interface  which  may  be  used  with  many  other  SAGA  tools.  Figure  1 
illustrates  the  relationship  between  GNU  Emacs  and  several  SAGA  tools.  Figure  2 
shows  several  features  of  the  GNU  Emacs  environment  which  will  be  discussed  in  the  fol- 
lowing sections. 

2.  GNU  Emacs 

2.1.  Standard  Character— level  Editing:  GNU  Emacs  provides  standard 

character-level  editing  with  a full  screen,  multi-window,  tiled  display  [l] . All  the  typical 
character  manipulation  commands  are  available  as  well  as  cursor  movement,  screen  pag- 
ing, global  search  and  replace,  etc.  - things  that  a programmer  would  expect  in  an  edi- 
tor. Character-level  editing  is  what  programmers  are  used  to,  but  the  reason  for  using 
GNU  Emacs  stems  from  its  extensibility  more  than  its  familiarity. 

Like  other  editors  in  the  Emacs  family,  GNU  Emacs  allows  the  user  to  extend  the 
initial  command  set  by  using  a LISP-like  language  to  write  functions  which  may  then  be 
bound  to  key  sequences.  GNU  Emacs  LISP  is  a fairly  complete  LISP  extended  to 
include  primatives  for  editing  in  a multi-buffer,  multi-window  context. 

2.2.  Language  Specific  Modes 

Each  buffer  may  have  several  modes  associated  with  it  which  correspond  to  buffer- 
specific  commands,  variables,  etc.,  appropriate  for  editing  the  text  in  the  buffer.  Several 
language  modes  are  typically  provided  in  the  Emacs  library  of  LISP  programs.  A 
language  mode  may  be  automatically  associated  with  a buffer  based  on  the  name  of  the 
file  being  edited. 


2.3.  Holophrasting,  Tags,  etc.  Several  language-independent  functions  useful  to 
program  development  are  provided  with  GNU  Emacs.  A global  holophrasting  feature 
allows  the  user  to  select  the  indent  level  beyond  which  text  is  not  displayed  for  a specific 
buffer.  A general  Tags  facility  allows  the  user  to  maintain  a database  of  tags  which  are 
associations  between  names  and  references  to  locations  within  several  text  files. 

2.4.  Command  Completion  Templates:  User  defineable  macros  and  abbreviations 

are  supported  by  GNU  Emacs.  A general  completion  function  allows  the  user  to  build 
customized  tools  for  completion  of  initial  character  sequences.  We  have  used  this  capa- 
bility to  provide  a language  specific  template  system.  The  user  enters  the  initial  charac- 
ters of  a symbol  followed  by  a completion  command  (key  press).  If  the  initial  characters 
match  one  of  the  symbols  in  the  completion  list,  they  are  replaced  by  the  full  symbol 
name  or  by  an  associated  template. 

2.5.  Help  Facilities:  GNU  Emacs  provides  extensive  on-line  documentation  of  all  the 

editing  commands.  A user  defined  command  may  make  use  of  the  same  documentation 
facility  by  including  a documentation  string  in  the  command  definition.  Nevertheless, 
considering  the  large  number  of  commands  available  to  the  user,  it  is  sometimes  difficult 
to  quickly  find  the  appropriate  command.  We  have  written  a hierarchical  menu  inter- 
face to  most  of  the  Emacs  commands  in  the  style  of  Lotus  12  3.  That  is,  a prefix  com- 
mand opens  up  a single  fine  help  menu;  several  fines  of  menu  items  are  possible  in  the 
case  of  large  menus.  The  first  letter  of  each  menu  item  is  a key  command  which  opens 
up  a lower-level  menu,  etc.,  down  to  a real  command.  When  a real  command  is  found, 
the  documentation  string  for  the  command  is  available;  the  associated  key  sequence  for 
the  command,  if  any,  may  be  reported;  or  the  command  may  be  executed  immediatly. 

2.8.  Incremental  Parser  The  first  subprocess  which  has  been  installed  under  the 
GNU  Emacs  front  end  is  an  incremental  parser.  The  SAGA  research  group  had  previ- 
ously created  an  incremental  parser  with  its  own  screen-oriented  editor  called  EPOS. 
The  user  interface  for  EPOS  is  difficult  to  use  and  the  large  program  was  difficult  to 
maintain.  We  decided  that  the  extensible  GNU  Emacs  editor  could  be  a powerful  front 
end  for  the  incremental  parser  as  well  as  for  other  SAGA  projects. 

With  GNU  Emacs  as  the  front  end,  the  user  is  allowed  to  modify  any  text  in  the 
character  representation  of  a program.  As  changes  are  made,  the  corresponding  termi- 
nal tokens  in  the  parse  tree  representation  of  the  program  must  be  updated  and  the 
tokens  reparsed.  The  reparsing  algorithm,  described  in  [2j,  minimizes  the  extent  of  the 
reparse  by  maintaining  the  parsing  stack  state  at  the  time  each  token  is  parsed.  In  the 
worst  case,  the  whole  text  stream  must  be  reparsed,  but  usually  only  a small  neighbor- 
hood around  the  change  requires  a reparse. 

A modification  of  GNU  Emacs  (as  distinct  from  a LISP  extension)  was  required  to 
support  the  incremental  parser.  The  modification  made  use  of  the  Emacs  Undo  capabil- 
ity which  allows  the  user  to  undo  previous  changes  as  far  back  as  it  can  remember.  As 
changes  are  made  to  the  text  buffer,  they  are  passed  to  a LISP  function  with 
identification  of  the  kind  of  change.  The  LISP  function  collects  contiguous  changes  until 


a non-contiguous  change  is  made.  At  that  point,  it  sends  the  contiguous  change  to  the 
incremental  parser.  A reparse  is  performed  automatically  with  every  new  contiguous 
change  or  an  explicit  reparse  may  be  requested  by  the  user. 

An  example  of  one  contiguous  change  is  given  in  Figure  3.  A contiguous  change 
consists  of  a deletion  and  an  insertion  at  the  same  point.  As  each  new  change  is  made,  it 
is  either  incorporated  into  the  contiguous  change  if  it  overlaps  or  abuts  the  current  con- 
tiguous change;  otherwise  the  change  is  the  beginning  of  a new  contiguous  change. 
Three  kinds  of  changes  are  possible:  insertion,  deletion,  and  replacement.  Both  the 
beginning  and  end  points  of  a change  may  each  fall  in  one  of  three  regions  relative  to 
the  contiguous  change:  before,  within,  and  after. 

3.  Adapting  the  Parser  to  the  New  Interface 

To  use  the  EPOS  incremental  parser  with  GNU  Emacs  as  a front  end,  a new 
simplified  command  language  was  developed  that  allows  a front  end  to  give  commands 
such  as  move  the  cursor,  delete  text,  or  insert  text.  Theoretically,  this  command 
language  could  be  used  by  a human,  and  in  fact  it  was  so  used  for  testing  purposes,  but 
for  any  significant  program  this  would  be  impractical.  However,  this  modularity  means 
that  the  parser  could  be  used  with  another  front  end  editor  without  modification. 

To  be  useful  with  a real  text  editor,  the  parser  must  be  able  to  handle  any  text  a 
user  may  enter.  The  original  EPOS  editor  only  allowed  spaces  before  tokens,  and  conse- 
quently trailing  blanks  on  a text  line  where  not  permitted.  In  addition,  tabs  could  not 
be  used  at  all.  As  the  parser  was  adapted  to  the  GNU  Emacs  front  end,  this  unaccept- 
able limitation  was  removed  by  changing  the  internal  representation  of  the  tokens  in  the 
parse  tree.  Another  limitation  of  the  old  EPOS  was  the  restriction  of  comments  to  a sin- 
gle line  only.  Now,  each  line  of  a multi-line  comment  is  a separate  token. 

In  the  process  of  extracting  the  parser  and  making  modifications  to  it,  a number  of 
previously  unknown  bugs  were  discovered  and  fixed.  The  changes  made  were  made  with 
the  intention  of  supporting  language  independence.  The  only  language  specific  parts  of 
the  parser  which  remain  are  in  the  lexical  analyzer. 

3.1.  Multiple  Syntax  Errors:  One  of  the  advantages  of  the  SAGA  incremental 

parser  is  that  any  number  of  syntax  errors  may  be  present  in  the  parse  tree  con- 
currently. This  is  accomplished  by  maintaining  the  erroneous,  unparsable  tokens  under 
a "marked"  non-terminal.  This  marked  text  will  be  reparsed  if  it  is  affected  by  a future 
change. 

Often  while  editing  a program,  the  programmer  will  find  it  most  efficient  to  leave 
the  text  in  a syntactically  erroneous  state.  An  example  is  illustrated  in  Figure  4.  To 
enclose  several  statements  in  a Repeat  loop,  the  initial  "repeat"  must  be  inserted  leaving 
a syntax  error  later  in  the  program  usually  at  the  point  where  an  "until"  is  expected. 

3.2.  Text  vs.  Template  Editing:  An  alternative  to  text  editing  with  an  incremental 
parser  is  template  editing.  A template  editor  may  restrict  the  kinds  of  modifications  of  a 
program  text  to  syntactically  correct  transformations,  or  it  may  reparse  the  whole  text, 


or  reparse  at  the  expression  level  for  convenience.  For  the  above  example,  the  task  of 
enclosing  several  statements  in  a Repeat  loop  involves  first  cutting  all  the  statements, 
second  inserting  the  "repeat  ...  until"  template,  and  finally  pasting  the  statements  into 
the  Repeat  loop. 

A text  editor  with  an  incremental  parser  provides  the  most  flexibility  by  allowing 
arbitrary  text  modifications  while  supporting  templates  if  desired.  We  have  implimented 
a simple  template  system  keyed  on  an  initial  substring  of  the  template  text.  The  user 
enters  the  first  few  unique  letters  of  a template  followed  by  a "completion"  key.  If  the 
letters  match  a template,  it  is  expanded  in  place  of  the  letters.  If  the  letters  do  not 
match  a template,  an  error  message  is  given,  but  if  the  letters  match  more  than  one  tem- 
plate, a help  list  of  the  possible  matches  may  be  displayed  for  the  user  in  a separate  win- 
dow. 

3.3.  Parse  Tree  Commands:  Since  a parse  tree  representation  of  a user’s  program 

is  being  maintained,  the  user  may  wish  to  make  use  of  it  for  more  than  error  checking. 
Typical  commands  which  must  interact  with  the  parse  tree  include  token  and  subtree 
selection,  forward  and  reverse  motion  by  token  or  subtree,  and  subtree  transformations. 
Such  user-level  commands  are  "translated"  by  a LISP  program  into  messages  to  the 
parser.  The  parser  responds  with  messages  which  may  indicate  the  appropriate  relative 
character  motion,  region  selection  or  a replacement  string. 

We  have  developed  a package  of  transformation  routines  to  speed  the  conversion  of 
while  loops  to  repeat  loops,  case  statements  to  nested  if  statements,  etc.  Logical  con- 
sistency is  maintained  across  the  transformations  by  negating  and  reducing  logical  condi- 
tions or  duplicating  statements,  as  required.  The  transformation  routines  run  as  an 
additional  subprocess  and  are  given  access  to  the  parse  tree.  The  output  of  the  transfor- 
mation routine  may  be  simply  displayed  in  an  alternate  window  or  may  be  inserted  as  a 
replacement  string. 

4.  Incremental  Semantics  Processor 

An  important  component  of  the  SAGA  environment  is  semantics  processing.  An 
incremental  semantics  evaluator  is  being  developed  which  will  run  as  another  subprocess 
under  the  GNU  Emacs  front  end.  Changes  to  the  parse  tree  and  commands  which 
interact  with  semantic  information  will  be  communicated  to  the  semantics  evaluator 
which  maintains  its  own  semantic-level  representation  of  the  program.  The  semantics 
evaluator  may  also  return  commands  or  text  to  the  editor. 

As  an  example,  the  transformation  of  a while  loop  to  a repeat  loop  described  earlier 
is  more  appropriately  handled  by  a semantics-level  routine.  Specification  of  the  type  of 
transformation  and  the  subtree  to  be  transformed  is  first  sent  to  the  semantics  routine; 
the  transformation  is  applied;  the  new  text  representation  is  returned  to  the  editor  to 
replace  the  original  text;  and  the  replacement  action  is  sent  to  the  incremental  parser  for 
reparsing. 

The  semantics  component  of  the  SAGA  environment  will  play  in  important  role  as 
an  attribute  evaluator.  In  addition  to  incremental  compilation,  an  attribute  evaluation 


system  may  be  used  for  program  verification,  incremental  refinement,  and  project 
management.  But  for  all  of  these,  a unified  user  interface  is  required  as  well,  and  the 
extensible  GNU  Emacs  is  suitable. 

5.  Conclusion 

We  have  explored  the  practicality  of  using  an  extensible  text  editor  as  the  front  end 
for  a number  of  aids  for  program  development.  GNU  Emacs  has  proven  to  be  worthy  of 
this  task  in  providing  the  generality  of  a powerful  text  editor  and  the  flexibility  required 
for  communication  with  independently  running  subprocesses. 
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Abstract 

Control  of  system  configurations  has  long  been  a problem.  The  SAGA  project  is  investigating 
several  such  problems  in  the  area  of  software  development.  Clemma,  a prototype  system  for  managing 
configurations  on  several  levels,  is  presented  with  a discussion  of  the  details  of  the  system’s  guiding  princi- 
ples. 

1.  Introduction 

A growing  problem  in  the  development  and  maintenance  of  software  projects  is  the  organization, 
manipulation  and  storage  of  the  large  number  of  components  involved.  A single  medium-sized  software 
system,  with  10  to  50  thousand  lines  of  code,  may  be  composed  of  several  dozen  separate  computer  files. 
Requirement  specifications,  design  documents,  project  plans,  user  manuals,  source  code,  test  data — all  may 
be  stored  on-line  and  all  must  be  maintained  throughout  the  lifetime  of  a project.  This  requires  the  abil- 
ity to  track,  identify  and  control  all  changes  made  to  a system’s  files.  As  the  size  and  complexity  of  sys- 
tems grows,  the  difficulty  of  performing  these  operations  also  grows. 

Another,  more  recent,  problem  is  the  distribution  of  a system’s  component  files.  Modern  software 
development  theory  promotes  modularity , the  grouping  of  system  components  into  logically-related  clus- 
ters [ref?].  This  technique  has  several  recognized  benefits,  both  to  the  software  and  to  the  engineers 
involved  in  its  production.  Unfortunately,  the  separation  of  the  components  of  a system  increases  the 
difficulty  of  treating  a large  system  as  a single  entity,  or  even  as  a limited  number  of  modules.  In  addi- 
tion, most  means  of  grouping  software  system  components  into  modules  are  still  relatively  unsophisticated, 
and  seem  to  have  little  support  in  many  development  environments.  What  is  needed  is  a way  of  being  able 
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to  refer  to  and  manipulate  the  components  of  a system  on  several  different  levels,  from  that  of  a single  file 
to  that  of  a module  to  that  of  a system.  Current  efforts  at  solving  this  problem  are  widespread,  though 
few  have  gained  any  widespread  use  in  “the  real  world”  [12]. 

Traditionally,  the  task  of  keeping  records  on  all  material  produced  during  a software  project  and 
taking  responsibility  for  change  control  is  the  duty  of  a project  librarian  [ref?].  This  entity  (sometimes  a 
single  person)  is  responsible  for  tracking  all  of  the  components  developed,  identifying  the  state  of  each,  and 
ensuring  that  a particular  component  is  releasable  for  use  by  other  project  members  or  users.  This  func- 
tion is  crucial  to  a project,  as  it  is  often  the  principal  interface  between  management  and  staff  for  gauging 
and  controlling  the  progress  of  a project. 

With  the  push  to  automate  various  functions  of  the  software  development  life-cycle,  a means  of 
tracking  the  state  of  a large  system  automatically  was  inevitable,  and  several  efforts  are  notable.  Early 
efforts  resulted  in  systems  called  project  support  libraries , which  essentially  automated  some  of  the  work  of 
a human  project  librarian.  More  recently,  the  entire  area  of  identifying,  tracking,  and  controlling  changes 
to  systems  has  been  classed  as  software  configuration  management . The  effort  to  apply  CM  techniques  to 
software  development  has  resulted  in  SCM,  and  with  varying  degrees  of  success.  The  principal  problems 
arise  in  the  youth  of  software  engineering  as  field  of  endeavour.  Not  enough  is  known  yet  about  construct- 
ing software  systems  in  a reliable  fashion  to  easily  enable  one  to  automate  its  management.  For  this  rea- 
son, SCM  is  still  an  area  open  to  experimentation. 

Since  the  SAGA  project  (Software  Automation,  Generation,  and  Administration)  is  concerned  with 
automating  the  prduction  of  software  systems,  it  was  inevitable  that  we  should  investigate  the  issue  of 
configuration  management  for  such  systems.  Our  efforts  to  date  demonstrate  the  need  for  an  automated 
means  of  handling  the  components  of  large,  hierarchical  software  systems,  on  several  levels  of  abstraction. 
A prototype  configration  librarian , Clemma,  is  our  attempt  to  provide  a means  of  investigating  the  prob- 
lem of  configuration  management  in  large  software  systems. 
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2.  Background 


Before  Clemma  is  discussed  in  more  detail,  a few  of  the  terms  relevant  to  a discussion  of  software 
configuration  management  should  be  defined.  A configuration , for  our  purposes,  is  a “snapshot”  of  the 
components  of  a system,  describing  their  states  and  their  interrelationships  at  a specific  point  in  time. 
Each  of  the  individual  elements  of  a configuration  is  called  a configuration  item  (Cl).  The  effort  to  deal 
with  the  problems  of  controlling  the  development  and  evolution  of  configurations  is  called  configuration 
management  (CM).  Specifically,  Bersoff  defines  this  as  the  discipline  of  identifying  the  configuration  of  a 
system  at  discrete  points  in  time  for  the  purpose  of  systematically  controlling  changes  to  the  configuration 
and  maintaining  the  integrity  and  traceability  of  the  configuration  throughout  the  system  life  cycle 
[Bersoff,  84].  Software  configuration  management  is  the  application  of  CM  techniques  to  projects  com- 
posed principally  of  software. 

3.  Clemma 

Clemma  is  a prototype  of  a configuration  control  system.  The  system  is  modeled  on  a project  library 
concept,  and  as  such  most  of  the  operations  of  the  system  are  analogous  to  conventional  library  operations. 
But  the  requirements  of  a project  librarian  are  different  from  those  of  a conventional  librarian,  so  in  some 
places  there  are  operations  which  are  wholly  new  to  the  idea  of  a library. 

An  important  aspect  of  Clemma  which  should  be  mentioned  is  the  fundamental  configuration  item  in 
the  system:  a file.  Many  of  the  efforts  currently  underway  to  provide  configuration  control  for  software 
systems  currently  use  such  logical  entities  as  subprograms  and  shared  data  structures  as  the  basic  con- 
trolled items.  In  analysing  the  problems  we  wished  to  address,  we  found  that  the  isolation  and  recognition 
of  logical  entities  within  files  vastly  complicated  the  management  issue,  particularly  in  an  environment 
which  will  be  multilingual  and  which,  hopefully,  will  be  used  to  model  different  development  methodolo- 
gies. Current  systems  for  dealing  with  independently  produced  project  components  seem  to  impose  strict 
constraints  on  the  developer,  so  that  any  item  produced  conforms  to  format  standards  which  allows  the 
system  to  identify  the  configuration  items.  Such  systems  thus  pay  a price  in  flexibility  for  this  sophistica- 
tion. 
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However,  files  are  relatively  easy  to  handle.  They  are  easily  recognizable,  have  discernable  attri- 
butes, and  can  be  manipulated  easily  in  most  operating  systems.  Rather  than  limit  the  applicability  of 
Clemma  to  software  developed  to  rigid  structural  guidelines,  we  have  opted  to  place  some  of  the  burden  on 
the  user  by  allowing  him/her  to  program  as  s/he  wants.  This  does  require  the  user  to  inform  Clemma  of 
some  of  the  logical  attributes  of  a configuration  item  manually,  as  they  would  be  difficult  to  determine 
automatically.  The  task  of  performing  this  manual  characterization  of  items  is  being  investigated,  so  it  is 
possible  in  the  future  that  an  even  finer  granularity  of  items  will  be  possible.  For  now,  having  the  file  as 
the  basic  element  of  a configuration  is  sufficient. 

3.1.  System  Architecture 

As  mentioned,  Clemma  is  a configuration  librarian . Configuration  items  are  stored  in  libraries , and 
nearly  all  of  the  available  operations  are  analogous  to  those  of  a conventional  library.  A Clemma  library 
has  four  main  parts:  a repository , containing  copies  of  all  of  the  configuration  items  in  the  library;  a cata- 
log, which  is  a database  holding  all  of  the  information  on  the  items  in  the  repository;  a temporary  storage 
pool  for  the  read-only  copies  of  checked-out  items,  call  the  user  area;  and  a table  of  the  current  users  of 
the  library  items,  called  the  usage  list.  The  purpose  of  each  of  these  structures  will  be  detailed  as  the 
operations  provided  by  Clemma  are  described  below. 

3.2.  Operations 

As  a configuration  librarian,  Clemma  has  several  functions. 

• Create  a configuration  library.  This  operation  causes  all  of  the  library  data  structures  to  be  created 
and  initialized.  The  creator  of  the  library  also  establishes  directors  for  the  library — individuals  who 
have  total  control  over  the  creation  and  deletion  of  library  elements  and  all  other  library  capabili- 
ties. A delete  operation  exists  to  undo  all  of  the  actions  of  create. 

• Catalog  a configuration  item.  This  creates  an  entry  for  an  item  in  the  catalog.  Information  about 
the  item  is  collected  and  stored  in  the  database,  and  an  empty  version  chain  is  initialized  for  the  item 
in  the  repository.  In  addition,  manager  permissions  are  established  for  this  item  by  the  individual 
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who  catalogs  it.  If  a user  has  manager  permission  for  an  item,  s/he  then  has  total  control  over  that 
item.  (Directors  subsume  ail  of  the  powers  of  a manager.)  The  uncatalog  operation  peforms  the 
inverse  of  the  catalog  operation. 

• Install  a version  of  a configuration  item.  A copy  of  a cataloged  item  is  attached  to  the  version  chain 
for  that  item  in  the  library  repository.  Additional  information  about  the  particular  version  being 
installed  is  collected  and  stored  in  the  catalog.  Note  that  only  an  item  manager  or  someone  who  has 
been  granted  permission  by  a manager  may  install  a version  of  an  item.  Managers  may  create  a list 
of  allowable  users  to  restrict  access  to  an  item.  Remove  is  the  operation  which  removes  installed 
versions  from  a library. 

• Checkout  a version  of  a Cl  for  read-only  use.  This  gives  the  user  performing  the  operation  access 
to  a read-only  copy  of  a library  item.  The  copy  resides  in  the  user  area,  and  is  shared  by  all  indivi- 
duals who  have  checked  the  item  out  for  read-only  use.  If  the  access  to  the  item  is  restricted,  then  a 
manager  of  the  item  must  give  an  individual  permission  to  checkout  the  item.  When  an  item  is 
checked  out  of  the  library,  an  entry  is  made  in  the  usage  list  recording  this  fact. 

• Checkout  a version  of  a Cl  for  modification.  This  gives  the  user  a writable  copy  of  the  Cl.  Per- 
mission to  check  an  item  out  for  modification  must  be  granted  by  a director  or  one  of  the  item’s 
managers.  An  entry  is  made  in  the  usage  list  when  the  item  is  checked  out. 

• Return  a version  of  a CI.  This  operation  is  used  for  returning  a checked-out  copy  of  an  item  to  the 
library.  The  user’s  access  to  the  item  or  local  copy  is  removed  and  the  user’s  name  is  deleted  from 
the  usage  list  for  that  item.  This  does  not,  however,  put  any  revised  items  in  the  repository — the 
install  operation  must  be  used  for  that  purpose. 

• Collect  individual  configuration  items  into  a single  item.  This  operation  is  used  for  the  creation  of 
collections , which  are  formatted  lists  of  configuration  items.  These  require  some  explanation.  When 
a software  system  is  created,  it  is  often  broken  up  into  modules  for  reasons  well  known  to 
structured-programming  enthusiasts.  In  a configuration,  one  often  wants  to  treat  not  only  the  indi- 
vidual files  in  the  configuration  as  CIs,  but  also  the  modules  into  which  the  system  is  divided.  To  do 
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this  in  Clemma,  all  of  the  useful  files  of  a module  are  first  cataloged  and  installed  as  CIs.  When  the 
files  are  cataloged,  they  are  each  assigned  a call  number , which  uniquely  identifies  a particular  Cl  to 
the  library.  The  collect  command  takes  the  list  of  CIs  comprising  a module,  and  creates  a specially 
formatted  list  of  their  call  numbers  and  stores  this  in  a file.  This  file  can  then  be  cataloged  and 
installed  as  its  own  (albeit  special)  CI.  The  type  of  a Cl  (file  or  collection)  is  stored  as  an  attribute  of 
the  CI  in  the  catalog. 

► Assign  attribute  values  to  a configuration  item.  This  operation  is  used  to  store  attribute  values  for 
CIs  in  the  catalog. 

i Compare  the  differences  between  versions  of  a configuration  item.  This  prints  out  a listing  of  the 
changes  made  from  one  version  to  another  of  a specified  CI. 

• Identify  items  from  the  library.  For  a given  item,  it  is  often  necessary  to  provide  a history  of  the 
item  and  its  development.  The  identify  operation  prints  a formatted  listing  of  all  the  information 
pertinent  to  a configuration  item  or  a particular  version  of  a configuration  item.  This  information 
allows  reasoned  decisions  to  be  made  about  the  item. 

• Retrieve  items  from  the  library  using  attribute-matching.  The  retrieve  operation  yields  the  call 
numbers  of  all  the  CIs  in  the  library  who  have  attribute  values  matching  those  given  to  the  opera- 
tion. This  allows  indexing  of  items,  and  is  a great  aid  to  promoting  re-use  of  software  components. 

All  of  the  operations  implemented  thus  far  in  Clemma  have  been  chosen  for  their  accordance  with 
the  library  model  and  for  their  applicability  to  the  problems  involved  in  software  configuration  manage- 
ment. But,  perhaps  their  prime  value  is  as  a means  of  investigating  the  types  of  operations  which  would 
naturally  be  required  by  someone  trying  to  perform  configuration  control  on  a developing  software  system. 

4.  Implementation  Issues 

The  current  implementation  of  Clemma  is  based  on  the  capabilities  provided  by  the  Unix™  operat- 
ing system,  and  some  of  the  terminology  used  is  specific  to  that  system. 
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The  implementation  of  the  four  principle  data  stuctures  comprising  the  library  is  fairly  straightfor- 
ward. A library  is  given  a home  directory  when  it  is  created,  and  subdirectories  are  set  up  for  the  reposi- 
tory, the  user  area,  the  catalog  and  the  usage  list.  This  helps  to  provide  some  encapsulation  for  the  under- 
lying implementation  of  each.  The  repository  will  hold  copies  of  all  the  CIs  in  the  library,  which  could 
possibly  number  in  the  thousands.  For  this  reason,  we  are  investigating  various  means  of  file  organization, 
compression,  and  archival  so  that  an  efficient  means  of  dealing  with  such  a large  number  of  files  may  be 
ascertained.  Robustness  is  also  a strong  concern,  as  any  system  such  as  this  must  ensure  its  users  that 
their  components,  when  installed  in  a library,  are  as  secure  or  even  more  secure  than  they  would  be  when 
left  in  the  users  own  directories.  Various  protection  schemes  are  under  scrutiny  which  may  provide  this 
security. 

The  catalog  of  a library  is  probably  the  next  most  important  data  structure.  It  is  used  to  provide 
the  central  storage  and  indexing  facility  for  all  of  the  attributes  of  the  library  items.  As  this  function  is 
primarily  that  of  a database,  the  Troll/USE  DBMS  is  being  used  in  the  current  implementation  of 
Clemma.  Troll  provides  a powerful,  flexible,  robust  interface  to  the  catalog,  and  seems  to  be  a tool  which 
will  have  a great  deal  of  applicability  in  the  future  of  Clemma  and  other  SAGA  tools. 

The  usage  list  is  primarily  an  indexing  tool,  and  so  may  also  be  implemented  in  Troll.  Because  of  a 
somewhat  simpler  nature,  however,  other  types  of  structure  are  being  looked  at  as  a method  of  implemen- 
tation. If  the  inherent  slowness  of  a DBMS  can  be  avoided  while  still  providing  the  necessary  function  and 
robustness,  then  it  is  obvious  that  such  efforts  are  necessary. 

The  last  structure  of  a library,  the  user  area,  is  simplest.  This  is  a directory  of  read-only  copies  of 
checked-out  items.  The  user  gets  a link  to  one  of  the  copies  when  s/he  does  a check-out  on  that  item,  and 
the  link  is  removed  when  the  item  is  returned.  Since  all  of  the  copies  are  owned  by  the  director  of  the 
library,  there  is  no  chance  for  accidental  deletion  of  the  item  by  the  user.  This  scheme  provides  a simple 
means  of  controlling  the  sharing  of  such  items  by  several  users. 
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5.  Conclusion 


Clemma  is  an  attempt  to  provide  a simple,  flexible  means  of  constructing  and  maintaining 
configurations  of  small-  to  medium-sized  software  systems.  The  basic  premise  is  the  treatment  of  the 
components  of  a system  as  attributed  objects,  and  the  use  of  a library  model  for  the  storage,  indexing,  and 
sharing  of  these  objects  in  a configuration  of  a system.  The  Unix3®  file. system  used  as  the  implementation 
medium,  and  the  Troll/USE  DBMS  is  used  to  provide  for  the  storage  and  indexing  of  the  attributes  of  the 
stored  items.  We  believe  that  Clemma  is  a useful  tool  and  one  that  will  provide  many  important  insights 
into  the  problems  involved  in  software  development. 
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1.  Preface 

This  paper  is  a statement  of  ideas  that  are  currently  being  investigated.  We  believe  that  many  of 
them  will  be  useful  in  a software  management  tool.  We  would  appreciate  comments,  criticisms,  and  refer- 
ences to  similar  work. 

2*  Introduction 

We  wish  to  automate  much  of  the  management  and  tracking  of  the  products  involved  in  the  lifetime 
of  a software  system.  To  do  this  we  need  a model  of  the  tasks  involved  and  a means  to  implement  the 
model.  We  present  a consumer  producer  model  that  is  based  on  a production  cycle  that  occurs  in  what  we 
view  to  be  similar  situations  in  the  “real  world,”  e.g.  the  construction  industry  [Spector  and  Gifford,  86]. 
For  the  implementation,  only  speculation  about  characteristics  of  the  tool  is  now  offered.  We  will  close 
with  how  this  management  tool  relates  to  some  of  the  ideas  in  the  literature  and  the  SAGA  project. 

3.  Model 

We  base  our  model  on  a management  by  objectives  approach  where  a producer  satisfies  the  need  of  a 
consumer.  A consumer  has  a need  for  a product,  either  goods  or  services,  which  he  must  request  from 
someone  other  than  himself.  Therefore,  he  procures  a producer  to  provide  the  product. 

This  model  (see  Figure  1)  is  a simplification  of  what  we  perceive  to  be  the  process  in  the  “real 
world.”  Often  a product  requirements  are  given  to  many  producers  who  submit  proposals  for  a product 
that  satisfies  the  consumers  requirements.  The  consumer  then  chooses  the  producer  with  the  best  proposal 
and  works  with  him  to  create  a specification  for  the  product.  The  producer  then  creates  a product  to  meet 
this  specification.  In  some  sense,  the  specification  is  implemented  under  timing  constraints  and  other 
acceptance  criteria.  After  the  consumer  has  received  and  accepted  the  product,  the  production  cycle  ends. 
We  call  this  production  cycle  a task . Finally,  we  note  that  if  a producer  is  not  able  to  satisfy  the 
specification  sometime  during  the  course  of  the  production,  then  the  consumer  and  the  producer  may  agree 
to  some  revision  of  the  specification  so  that  it  may  be  satisfied. 

In  this  model,  a specification  specifies  the  consumer,  producer,  resources  supplied  by  the  consumer, 
product(s)  to  be  delivered  by  the  producer  (including  progress  reports),  start  and  end  dates,  delivery  dates, 
and  acceptance  criteria  for  product(s).  The  specification  does  not  specify  how  the  producer  will  fulfill  the 
specification.  A specification  may  request  such  a large  product  that  the  producer  in  turn  becomes  the  con- 
sumer several  (sub)products.  These  (sub)products  should  be  invisible  to  the  original  consumer. 

4.  Characteristics 

In  this  section  we  describe  some  of  the  characteristics  of  the  proposed  management  system.  We 
include  some  basics  about  the  tasks  used  in  the  system  and  the  requirements  made  of  the  system. 
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Consumer 

Product  Requirements 


Producer 


{create/revise  specifications 
authorize  task 


Product  Proposal 

modify/accept  specifications}* 
begin  implementation 


{sometimes  some  problems 

resulting  in  specification  revision} 


Product  is  delivered 


Product  is  accepted 

Figure  1.  Consumer  Producer  Model. 


4.1.  Tasks 

The  task  is  the  basic  entity  in  the  system.  It  will  be  a highly  structured  document  in  two  parts.  The 
first  part  is  the  specification.  The  second  part  is  the  implementation.  Both  parts  will  be  machine  inter- 
pretable. This  will  be  accomplished  with  programming  language  techniques,  although  the  tasks  and  their 
relationships  form  a database.  For  example,  we  expect  the  specification  and  implementation  parts  to  have 
a relationship  similar  to  the  definition  and  implementation  modules  in  Modula-2,  while  keeping  track  of 
the  state  of  tasks  and  the  relationships  between  them  is  best  done  using  database  methods. 

The  task  definition  will  be  visible  to  both  consumer  and  producer.  It  contains  consumer 
identification,  producer  identification,  start  date,  end  date,  delivery  dates,  resources  supplied  by  the  consu- 
mer, products  to  be  delivered,  and  their  acceptance  criteria. 

The  task  implementation  is  the  part  of  the  task  that  is  private  to  the  producer.  It  includes  the 
definitions  of  (sub)tasks  and  possibly  other  actions  that  the  producer  must  perform.  It  is  expected  that 
these  (sub)tasks  and  actions  may  be  related  in  a manner  similar  to  the  events  in  PERT  charts.  Simple 
PERT  charts  are  not  sufficient,  however,  because  we  need  to  be  able  to  “execute”  them.  In  particular,  we 
may  wish  to  use  looping  constructs  that  “trigger”  on  resources  or  inputs  supplied  by  the  consumer.  For 
example,  in  a change  control  board  we  would  like  all  user  change  requests  to  follow  the  same  procedure 
during  a maintenance  task. 

4.2.  System  Requirements 

We  want  the  management  system  to  accept  task  specifications,  execute  task  implementations, 
“notify”  consumers  of  producer  failures  for  certain  criteria,  and  automatically  generate  certain  types  of 
reports  (given  some  description  by  the  consumer  within  the  task  specification).  However,  we  currently 
assume  that  managers  will  use  separate  tools  for  such  things  as  cost  estimation  and  that  data  from  these 
tools  would  be  entered  into  tasks  by  hand  during  their  development.  (A  project  history  may  be  main- 
tained by  the  system  so  that  these  tools  and  future  projects  could  draw  on  the  experience  of  present  and 
past  projects.) 
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We  expect  the  management  system  to  be  able  to  run  if  only  task  specifications  are  available.  In 
other  words,  a consumer  and  a producer  may  authorize  a task  before  the  producer  completes  or  even  starts 
the  task’s  implementation.  The  implementation  must,  however,  be  completed  before  the  system  needs  to 
execute  it,  i.e.  before  the  start  date  of  the  task. 

The  early  stages  of  requirements  may  be  done  as  informal  development  of  the  task  specification  by 
the  consumer  and  the  (prospective)  producer(s).  Authorization  of  the  task  would  be  at  the  time  that  it  is 
submitted  to  the  management  system  (e.g.  compiled  and  loaded). 

If  we  are  to  allow  for  task  revision  as  noted  in  our  model,  then  we  need  a very  flexible  system,  to  say 
the  least.  This  may  be  handled  in  part  by  a version  control  mechanism.  We  hope  to  avoid  full-fledged 
object  oriented  systems  like  Smalltalk  because  of  their  complexity  and  difficulty  with  efficient  implementa- 
tion. We  do  notice  that  a blend  of  programming  language  techniques  (e.g.  task  contents)  and  database 
techniques  (e.g.  report  generation)  will  be  required. 

Finally,  we  would  like  to  have  a friendly  user  interface.  It  may  be  possible  to  do  task  specifications 
with  form  fillers  or  structured  document  editors  (e.g.  [Kimura,  86 ]).  For  the  implementations,  we  would 
prefer  a graphical  interface  as  it  would  make  the  PERT  qualities  more  apparent.  We  note,  however,  that 
our  goal  is  to  build  a management  system,  not  a slick  user  interface. 

5.  Conclusion 

We  have  presented  a consumer  producer  model  to  be  the  basis  for  a software  management  system. 
We  now  mention  some  relationships  of  the  software  management  system  (tool)  to  the  literature  and  the 
SAGA  project. 

5.1.  Literature  Relationships 

The  STARS  Project  [Martin,  83]  defined  various  task  areas  in  software  engineering  that  it  would 
work  on.  One  of  these  is  the  project  management  task  area  [Lubbes,  83],  In  [Lubbes,  83],  Lubbes  presents 
a table  of  functional  capabilities  for  software  management.  Figure  2 shows  those  capabilities  that  we 
believe  this  proposed  system  will  support  in  some  part. 

We  also  believe  that  the  consumer  producer  model  can  support  various  management  structures.  In 
[Daly,  79],  Daly  compares  and  contrasts  the  three  main  management  structures:  project,  functional,  and 
matrix.  Even  though  we  have  not  yet  worked  detailed  examples,  we  believe  that  the  consumer  producer 
model  is  sufficiently  flexible  to  capture  the  dependencies  in  each  structure,  including  those  in  which  one 
person  is  responsible  to  more  than  one  manager. 

Some  similar  ideas  appear  in  the  BRICS  system  [Howes,  84]  which  was  done  manually  initially.  An 
automated  version  was  (is)  under  development.  Among  these  ideas  is  the  ability  to  model  work  breakdown 
structures.  Tasks  and  sub-tasks  should  be  able  to  model  work  breakdown  structures  nicely. 

We  are  still  searching  the  literature  for  information  about  such  management  systems.  We  also 
expect  that  there  are  some  corporate  systems  without  publication  exposure. 

5.2.  SAGA  Relationships 

The  management  system  will  be  integrated  with  other  SAGA  tools.  The  most  important  of  these 
tools  are  the  configuration  management  and  electronic  communications  (see  Figure  2).  Resource 
specifications  for  access  to  system  documents,  libraries,  and  workspaces  may  be  specified  in  tasks.  During 
execution,  tasks  will  call  upon  the  configuration  management  to  supply  resources.  Communication  of 
tasks  and  and  notification  of  status  changes  may  be  done  using  mail,  notesfiles,  or  trays. 

In  [Campbell  and  Terwilliger,  86],  there  is  an  example  using  a change  control  board  in  the  SAGA 
ENCOMPASS  environment.  Figures  3 and  4 show  some  kinds  of  forms  which  task  specifications  could  be 
based  on.  We  have  used  some  BNF-like  notation  in  the  figures  to  indicate  the  use  of  standard  forms  for 
tasks,  i.e.  the  system  may  be  able  to  support  different  “types”  of  tasks.  At  this  time,  we  are  still  investi- 
gating semantics  for  the  relationships  of  the  data  in  a task.  These  semantics  will  depend  on  the  database 
aspects  of  the  management  system. 
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possibly  to  other  tools 
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sick,  84],  and/or  trays  [Campbell 
and  Terwilliger,  86] 


Interactive  Work  Planning  creation/revision  of  tasks 
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Configuration  Management  interface  to  configuration  manager 

for  resource  allocation  and  work 
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Figure  2.  Management  Tool  Capabilities. 


User  Change  Request  < requested  > 


Originator:  < person  > 

At:  < address  > 

Phone:  < phone  > 

{Net:  <net_address> } 

Received:  <date> 

Accepted:  <date> 

Closed:  <date> 

Product:  <id> 

Product  Number:  <productjiumber> 

Version:  < version Jd> 

Related  Products:  { 

Product:  <id> 

Product  Number:  < product __number> 
Version:  < version Jd> 

} 

Request  Type:  <Error]Modification]Enhancement> 

Severity:  < severity Jevel> 

Current  Behavior:  <text> 

Requested  Behavior:  <text> 


Receiver:  < person  > 
At:  < address  > 

Phone:  <phone> 

{Net:  <net_address>} 


Resolutions 


[Temporary: 

<date> 

< Restriction]  Workaround]Patch]  Simulation  > 

< text  > ] 

Permanent: 

<date> 

< Update]  Release  > 

< text  > 

Figure  3.  User  Change  Request. 
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Program  Modification  Request  < request  Jd  > 


Requested  by:  < person  > 
At:  < address  > 

Phone:  < phone  > 

{Net:  <net_address>} 

Received:  <date> 
Accepted:  <date> 
Completed:  <date> 


Analyst:  < person  > 

At:  < address  > 

Phone:  < phone  > 

{Net:  <net_address>  } 


Associated  UCR:  < pointer_to_user_changejequest  > 
Resources:  { < access_to_other_services  > } 


Findings:  <text> 

Recommendation:  < Accept, Reject  > 

Associated  PMP:  < pointer_to_program_modification_plans> 


Figure  4.  Program  Modification  Request. 
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1.  Introduction 

This  paper  is  a summary  of  the  software  development  organization  and  practices  used  in  the  System 
75  and  related  projects  at  AT&T  Information  Systems  in  Middletown,  NJ.  It  is  the  result  of  a one  week 
observation  by  Robert  Sum  (the  author)  of  the  University  of  Illinois  at  Urbana-Champaign,  Urbana,  IL. 
The  purpose  of  the  week’s  observation  was  to  gather  information  about  current  software  engineering  prac- 
tices for  input  to  research  projects  in  software  engineering  at  the  University. 

The  next  section  of  this  paper  presents  an  overview  of  the  AT&T  life  cycle  and  the  AT&T  mange- 
ment  structure.  Subsequent  sections  discuss  various  processes  in  the  life  cycle  from  the  viewpoints  of  the 
people  and  meetings  the  author  attended.  Often  the  content  of  these  discussions  will  be  derived  primarily 
from  a meeting  with  one  particular  person.  During  these  discussions,  some  specifics  about  tools  will  be 
presented,  including  things  that  work,  things  that  do  not  work,  and  suggestions  for  things  people  would 
like  to  have.  The  paper  closes  with  a summary,  acknowledgements,  and  references. 

2.  AT&T  Organization 

In  this  section,  the  life  cycle  and  personnel  structure  that  AT&T  uses  in  software  development  are 
described.  It  should  be  noted  that  the  author  observed  several  projects  at  different  stages  of  development. 
Even  though  all  were  based  on  the  same  philosophy  and  common  ancestry,  there  were  some  differences. 
This  paper  is  a synthesis  of  these  projects’  development.  Hopefully,  it  reflects  their  philosophy  in  its  most 
recent  form  without  evolutionary  differences  causing  problems. 

2.1.  Life  Cycle 

The  AT&T  software  life  cycle  is  essentially  a classical  “waterfall”  model.  Figure  1 outlines  this  life 
cycle  showing  the  processes  (ellipses)  involved  and  their  products  (boxes).  The  processes  used  are:  product 
definition,  requirements  definition,  architecture  definition,  feature  definition,  high  level  design,  code  and 
test,  integration  and  quality  development,  system  test,  and  qualification.  Most  products  are  documents 
until  the  last  half  of  the  life  cycle  where  code  is  produced.  The  documents  and  code  produced  include: 
technical  prospectus,  requirements,  architecture,  feature  specifications,  external  design,  development 
specifications,  development  code,  system  code,  field  code,  and  released  code.  The  major  divergence  from 
the  classical  “waterfall”  model  in  the  AT&T  model  is  separate  independent  development  of  the  system’s 
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Figure  1.  AT&T  Life  Cycle  (cont’d) 


architecture  and  the  system’s  features.  This  separation  is  also  visible  in  the  documents  produced  during 
development  where  one  side  (left)  is  concerned  with  the  external  behavior  of  the  system  and  the  other  is 
internal  construction.  (In  relating  Figure  2 to  Figure  1,  the  process  specifications  and  process 
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decomposition  specifications  are  parts  of  the  development  specifications.  Also,  dashed  boxes  refer  to  code 
while  solid  boxes  are  documents.) 

One  should  note  that  Figure  1 describes  the  primary  development  cycle  and  that  several  other 
smaller  cycles  run  in  parallel  and  interact  with  it.  These  other  smaller  cycles  include  system  test  develop- 
ment and  project  management.  System  test  uses  a development  cycle  that  is  very  similar  to  the  primary 
development  cycle.  It  includes  system  test  plans  and  various  design  and  implementation  steps.  The 
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Figure  2.  Document  Hierarchy 
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project  management  cycle  is  based  on  sampling  of  the  primary  development  cycle  to  monitor  its  progress 
and  ensure  its  integrity. 

2.2.  Personnel  Structure 

The  management  of  the  development  process  is  done  with  many  specialized  groups.  These  include 
system  engineering,  project  management,  project  coordination,  software  design,  software  tools,  develop- 
ment, integration,  quality  development,  system  test,  and  field  support.  While  most  groups  have  input  to 
several  of  the  processes  in  the  life  cycle,  many  of  the  groups  control  a particular  process.  For  example,  the 
product  definition  process  is  controlled  by  systems  engineering  in  that  they  produce  the  technical  pros- 
pectus, but  product  management  and  software  design  provide  an  equal  if  not  greater  amount  of  input  to 
the  technical  prospectus.  Also,  the  technical  prospectus  is  reviewed  by  system  test,  field  support,  and  other 
groups  to  alleviate  any  difficulties  that  they  may  find  early  in  the  project’s  life  time.  Figure  3 lists  most  of 
the  relationships  between  development  groups  and  development  processes. 

AT&T’s  personnel  structure  is  a project  oriented  structure  with  some  leanings  toward  a matrix 
structure  to  gain  some  of  the  management  advantages.  For  instance,  the  developers  are  all  devoted  to  a 
particular  project,  but  a member  of  a project  coordination  group  may  have  several  projects  to  coordinate. 
It  is  also  possible  for  one  person  to  do  more  than  one  job  such  as  project  management  and  heading  a 
development  team.  It  is  often  the  case  that  tasks  such  a project  management  are  distributed  in  a func- 
tional manner.  This  has  become  even  more  prevalent  with  the  newest  project  that  is  being  developed  con- 
currently at  sites  separated  geographically  and  computationally.  The  basic  structure  of  the  personnel  and 
project  that  the  author  observed  is  depicted  in  Figure  4. 
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3.  Systems  Engineering 

Systems  Engineering  is  the  liaison  between  development  and  marketing.  Initially,  SE  receives  from 
marketing  the  perceived  needs  of  the  customer.  Then,  SE  receives  from  development  information  about 
technical  capabilities  that  customer  may  want.  It  is  possible  for  development  to  propose  a project  and 
then  have  SE  check  with  marketing  to  see  if  it  is  marketable,  but  this  is  not  common  primarily  because  of 
its  high  failure  rate  in  producing  marketable  products.  Often,  SE  acts  as  a mediator  between  marketing’s 
perception  of  the  customers’  needs  and  Development’s  desire  to  create  a system  with  all  of  the  latest  and 
greatest  technology.  After  doing  its  own  analysis,  SE  decides  whether  to  start  initiate  the  development  of 
a new  product. 

To  initiate  a new  product,  SE  begins  the  product  definition  process  of  the  main  development  cycle 
(noted  previously  in  Figure  1).  SE  brings  together  the  Project  Management,  Architecture,  and  Develop- 
ment people  with  the  goal  of  producing  the  Technical  Prospectus.  It  is  in  production  of  the  TP  that  SE 
often  finds  its  most  trying  times  as  a mediator.  The  Technical  Prospectus  describes  the  purpose  of  the 
product,  its  environment,  and  its  features.  This  document  is  informal  and  has  a varying  level  of  detail 
about  the  items  it  describes.  It  may  contain  only  one  half  page  per  feature  and  still  be  200  pages  long. 

After  the  TP  has  been  completed,  SE  works  together  with  Architecture  and  Development  during  the 
Requirements  Definition  process  to  produce  a formal  Requirements  document  that  clearly  states  the 
features  to  be  provided  by  the  product  and  the  resources  required  to  develop  and  maintain  the  product. 

The  last  major  interaction  that  SE  has  with  a project  is  the  review  of  the  feature  specifications  which 
are  done  by  the  Development  group.  At  this  time  SE  makes  sure  that  the  features  described  earlier  in  the 
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Requirements  are  specified  correctly  so  that  upon  implementation  the  product  will  meet  the  customer 
needs. 

Final  Note:  This  information  about  SE  is  derived  from  various  meetings  and  conversations  as  the 
author  did  not  have  the  opportunity  to  meet  with  someone  from  Systems  Engineering. 

4.  Project  Management  and  Coordination 

In  this  section  the  author  includes  both  Project  Management  and  Project  Coordination  because  of 
their  close  relationship  in  managing  and  monitoring  a project.  Project  Mangement  concerns  the  overall 
organization  of  the  project  concentrating  on  the  resources  (people,  machines,  etc.)  needed  to  develop  and 
maintain  a project.  Project  Coordination  concerns  the  organization  of  the  project  in  time  by  tracking 
deliverables  and  their  delivery  dates  to  ensure  that  the  project  stays  on  schedule.  Finally,  we  spend  some 
time  discussing  the  main  meetings  used  to  monitor  product  development. 

4.1.  Project  Management 

In  general,  text  book  (formal)  management  methods  exist  but  they  have  problems  when  being 
applied  to  actual  developments.  Many  of  these  problems  arise  from  the  fact  that  most  projects  have  a his- 
tory, i.e.  very  few  projects  actually  start  totally  from  scratch.  History  related  problems  include  uncertain- 
ties in  re-used  code,  incremental  feature  development,  retrofitting  new  features  into  old  products,  and 
merging  technologies.  Other  problems  result  from  the  inexactness  with  which  certain  resources  (most  not- 
ably people)  can  be  measured  and  predicted.  Resource  problems  include  variations  in  personnel  experience, 
personnel  productivity,  and  personal  preferences  which  can  require  a lot  of  political  effort  to  solve. 

Project  Management  is  most  visible  during  the  early  processes  of  the  development  cycle.  It  has 
direct  input  during  Product  Definition  and  prepares  the  Project  Plan  in  association  with  the  Architecture 
and  Requirements.  The  Project  Plan  includes  schedules  of  varying  detail  (including  staffing),  a brief  pro- 
duct description  (at  most  one  paragraph  per  feature),  resource  descriptions  (including  re-used  software, 
hardware,  computer  support),  development  method  outlines,  and  a list  of  open  (unresolved)  issues.  After 
this  early  activity,  PM  is  always  present  in  the  background  dealing  with  unresolved  issues  ensuring  the 
project’s  progress  toward  completion  in  a timely  and  efficient  manner. 

4.2.  Project  Coordination 

Most  of  the  monitoring  of  the  project’s  progress  is  done  by  the  Project  Coordination  group.  Gen- 
erally, each  project  has  one  project  coordinator.  This  project  coordinator  is  charged  with  the  task  of 
ensuring  that  deliverables  promised  by  one  group  to  another  will  be  delivered  by  the  time  that  they  are 
needed.  The  coordinator  schedules  all  project  milestones  and  acts  as  a negotiator  for  all  the  development 
groups.  She  oversees  or  writes  project  plans,  tracks  all  milestones  and  deliverables,  and  aids  project 
audits. 

Project  Coordination  starts  during  Product  Definition  and  continues  actively  throughout  the 
project’s  life  time.  The  primary  mechanism  for  deciding  project  coordination  issues  are  planning  and 
status  meetings.  Planning  meetings  plan  the  future  and  status  meetings  make  sure  the  present  agrees  with 
the  plan.  These  meetings  are  often  held  at  different  levels  of  detail  for  different  managerial  positions. 
Meetings  for  developers  and  supervisors  (first  level  managers)  are  down  to  the  individual  deliverable  that  is 
being  produced  whereas  meetings  for  department  heads  and  directors  look  at  larger  time  scales  like  project 
phases. 

To  create  the  project’s  schedules,  the  project  coordinator  often  starts  with  a marketing  date  and 
then  must  make  development  fit  a schedule  designed  to  be  done  by  that  date.  Although  there  are  some 
automated  tools  to  support  some  aspects  of  project  coordination  and  tracking,  many  (most)  of  the  work  is 
still  done  by  hand.  (More  details  about  tools  will  follow  later.)  Some  of  the  basic  problems  are: 

1.  People  do  not  reliably  inform  project  coordinators  of  a task’s  completion  or  delays, 

2.  People  try  to  prevent  lateness  by  moving  dates  without  dependency  information, 

3.  Current  tools  do  not  communicate  with  each  other, 
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4.  Tools  do  not  provide  a way  to  selectively  view  time  slices, 

5.  It  is  awkward  to  talk  about  partial  completeness  of  tasks, 

6.  The  project  coordinator  must  have  basic  knowledge  of  all  development  processes  and  groups  on  a 
global  scale. 

AT&T  has  tried  individual  coordinators  for  parts  of  projects  or  for  types  of  tasks,  but  has  had  more  suc- 
cess with  the  global  coordinator. 

4*3.  Meetings 

There  are  three  basic  meetings  used  for  management,  scheduling,  and  tracking  of  project  develop- 
ment. They  are  planning,  status,  and  integration  meetings. 

4.3.1,  Planning 

The  planning  meetings  decide  the  project’s  goals,  deliverables,  schedules,  and  task  assignments. 
These  meetings  are  held  either  weekly  or  bi-weekly. 

The  agenda  of  planning  meetings  work  on  the  high  level  view  of  the  project  and  how  the  project 
should  be  organized.  A planning  meeting  for  a project  just  beginning  (almost  planning  plans)  may  include: 

1.  software  development  planning, 

2.  hardware  development  planning, 

3.  integration  planning, 

4.  software  responsibilities, 

5.  summary  of  meeting,  and 

6.  open  discussion  of  things  for  the  next  meeting. 

Most  of  the  items  above  are  discussed  with  concern  what  are  reasonable  milestones  and  deliverables  for 
each  milestone.  During  the  open  discussion,  special  tasks  and  problems  are  brought  up  so  that  they  may 
be  investigated  before  the  next  meeting  and  discussed  then  if  necessary.  A list  of  important  tasks  and 
problems  is  kept.  An  example  of  special  tasks  might  be:  how  to  define  certain  deliverables  or  how  to  define 
terms  like  “quality”  with  respect  to  some  system  performance. 

4.3.2.  Status 

The  status  meetings  review  the  current  state  of  a project  and  compare  it  the  completed  tasks  with 
those  that  should  be  completed  according  to  the  project’s  schedule.  One  should  note  that  planning  meet- 
ings do  not  run  just  one  or  two  weeks  ahead  of  status  meetings,  but  that  they  run  substantially  ahead  of 
status  meetings  (several  months). 

The  agenda  of  status  meetings  changes  depending  upon  the  age  of  a project.  In  the  case  of  a new 
project,  some  of  the  following  may  be  discussed: 

1.  problems  with  coordination  of  development  start  up  at  multiple  sites, 

2.  additions  and  deletions  to  parts  of  the  project  (this  has  repercussions  in  planning  meetings), 

3.  problems  with  tools,  equipment,  and  other  resources  needed  for  development, 

4.  contents  and  completion  of  documentation  plan  for  the  project, 

5.  changes  to  items  being  worked  on  since  the  last  meeting, 

6.  discussion  on  how  to  track  software  development  progress,  and 

7.  methods  including  verification  methods  to  be  implemented. 

On  the  other  hand,  a somewhat  older  project  with  several  releases  in  the  field  may  discuss: 

1.  what  fixes  have  been  put  into  various  releases, 

2.  importance  of  new  bugs  and  how  soon  they  need  to  be  fixed, 
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3.  problems  with  fixes  that  have  not  yet  been  completed, 

4.  dates  for  system  testing  of  new  versions. 

Also  in  older  projects,  the  status  meeting  handles  most  of  the  short  term  scheduling  and  planning  work. 

4.3.3.  Integration 

Integration  meetings  concern  the  building  of  new  releases  (versions,  revisions)  of  a product  and  the 
problems  encountered  in  the  process.  Some  of  these  might  be  code  changes  that  conflict  some  other  code 
or  the  discovery  of  an  inadequacy  in  an  integration  tool.  Problems  are  resolved  to  ensure  the  integrity  of 
the  product. 

A typical  integration  meeting  might  include  the  following  on  its  agenda: 

1.  tool  problems, 

2.  laboratory  hardware  (new  and  old), 

3.  modification  requests  (MRs)  i.e.  bug  fixes  and  product  enhancements, 

4.  development  workspaces, 

5.  special  items  such  as  introducing  new  tools,  and 

6.  integration  procedures. 


Another  meeting  closely  related  to  the  integration  meeting  is  the  MR  review  meeting.  The  MR 
review  meeting  reviews  all  pending  MRs  on  the  project  and  whether  their  status  should  be  changed.  A list 
of  the  most  critical  bugs  is  maintained  so  that  fixing  them  receives  the  most  attention.  The  MR  review 
meeting  is  composed  of  specialists  from  each  part  of  the  project  to  expertise  in  all  areas  in  deciding  the 
nature  of  the  MRs.  Several  members  are  from  the  Quality  Development  group  as  well.  It  is  after  an  MR 
has  been  fixed  that  it  appears  in  the  integration  meeting. 

5.  Architecture 

The  systems  Architecture  is  developed  from  the  Technical  Prospectus  and  the  Requirements  by  the 
Architecture  group.  The  Architecture  is  a very  high  level  design  document  that  specifies  how  the  system 
will  be  implemented  to  support  the  features  in  the  Technical  Prospectus  and  the  Requirements.  In  Archi- 
tecture, as  in  other  processes,  history  and  experience  are  the  major  tools.  This  section  briefly  describes  the 
Architecture  Definition  process  and  the  creation  of  a work  breakdown  structure. 

5.1.  Architecture  Definition 

This  description  of  Architecture  Definition  is  based  on  the  Architecture  Definition  used  for  a family 
AT&T  switching  systems.  Almost  every  system  has  a predecessor  which  provides  an  immediate  basis  for 
the  new  system’s  architecture.  In  the  event  that  the  new  system  is  “just”  a new  release  or  an  extension  of 
an  existing  system,  then  the  new  system  may  re-use  both  hardware  and  software.  Even  when  a totally 
new  system  is  designed,  it  often  replaces  another  system  which  has  some  related  functionality. 

For  the  first  switch  in  this  family,  there  were  other  related  systems  that  experience  was  drawn  from 
to  create  the  Technical  Prospectus.  Three  areas  visible  in  the  first  switch’s  Technical  Prospectus  that  can 
be  seen  in  all  of  the  switches  are: 

1.  interactive  development  - the  telephone  user  and  the  functionality  that  he  sees, 

2.  system  administration  - the  customer’s  person  in  charge  of  programming  and  maintenance, 

3.  system  maintenance  - installation,  use  without  user  intervention,  reliability,  audits,  and  self- 
diagnostics. 

Consideration  of  the  support  required  by  each  of  these  areas  lead  to  a layered  architecture  with  a kernel 
operating  system  (see  Figure  5).  Each  layer  in  the  product  provides  primitives  for  the  layer  above.  Inter- 
faces between  each  area  were  defined  in  terms  of  the  primitives  that  each  layer  in  the  architecture 
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provided.  With  each  new  system  in  the  family,  adding  new  features  is  done  by  looking  in  the  existing 
architecture  and  finding  where  the  appropriate  primitives  are  provided  or  where  they  should  be  added. 
General  opinion  at  AT&T  is  that  this  architecture  was  quite  well  designed  as  it  has  supported  a successful 
family  of  switches  without  becoming  unbearably  complex.  The  same  architecture  has  been  used  for 
different  hardware  (including  microprocessors  with  different  hardware  architectures). 


5.2.  Work  Breakdown  Structures 

Work  breakdown  structures  are  derived  from  the  Technical  Prospectus  and  the  Requirements.  In 
general,  large  areas  are  mapped  onto  company  structures  such  as  departments  or  groups  (for  example,  sys- 
tem test).  Individual  features  are  eventually  mapped  to  individual  developers. 

Initially,  Systems  Engineering  sends  the  Requirements  to  various  supervisors  that  will  be  involved  in 
the  project.  SE  meets  with  the  supervisors  to  determine  answers  to: 

1.  How  many  people  are  needed  and  are  there  enough  people? 

2.  Is  the  project  technically  feasible? 

To  determine  the  answers,  the  supervisors  rely  primarily  on  personal  experience.  They  do  a quick- 
and-dirty  high  level  design  (not  generally  committed  to  paper)  'to  determine  a basis  for  staffing.  This 
quick-and-dirty  method  has  steps  like: 

1.  Choose  a large  section  of  the  project 

2.  categorize  its  features 

3.  fit  features  with  the  Architecture  and  people. 

In  fitting  the  people  to  the  features,  supervisors  use  a ranking  of  employees  by  ability.  Sometimes  a super- 
visor maintains  a concrete  list,  but  more  often  this  is  a mental  list.  This  way  supervisors  try  to  allocate 
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the  most  difficult  jobs  to  the  best  people.  After  this  rough  design  and  people  fitting,  the  whether  the  pro- 
ject is  technically  feasible  and  whether  there  is  enough  people  should  be  answered. 

Next,  Project  Management  and  Project  Coordination  determine  dependencies  and  track  progress 
using  planning  and  status  meetings.  Estimates  of  work  and  milestones  are  developed  in  a forum  where 
people  make  collective  decisions  on  work  estimates.  PM  and  PC  notify  supervisors  of  late  items  and  com- 
ing events.  If  a milestone  is  missed,  then  either  functionality  is  dropped  or  the  delivery  date  is  slipped. 
This  decision  is  usually  made  by  Project  Management  or  Systems  Engineering. 

5.3.  Some  Psychology 

There  is  a lot  of  psychology  and  politics  involved  in  determining  system  architectures  and  work 
breakdown  structures.  For  instance,  if  development  wants  to  build  a project,  then  they  tend  to  underesti- 
mate the  cost  of  the  project.  On  the  other  hand,  if  they  are  not  interested  in  building  it,  then  they  overes- 
timate the  cost.  Planning  is  the  mechanism  by  which  Systems  Engineering  and  development  reach  a 
compromise.  If  one  lets  either  side  win  over  the  other  then  the  project  usually  results  in  failure.  If 
development  is  interested  but  SE  has  no  market,  then  there  is  a wasted  effort.  On  the  other  hand,  if  SE 
has  a market  and  development  is  uninterested,  then  development  does  a poor  job  because  they  do  not  care 
about  what  they  are  working  on.  Understanding  peoples  wants  and  needs  is  the  biggest  problem  of  project 
development. 

One  example  of  this  is  in  the  Architecture  of  the  switches  mentioned  earlier.  The  Architecture  is 
based  on  primitives  and  layers.  But,  it  could  have  also  been  done  vertically  by  feature.  Why  was  one 
chosen  over  the  other?  One  hopes  that  technical  decisions  such  as  how  well  the  architecture  supports 
current  and  proposed  extensions  prevail.  Often,  however,  some  approach  has  a strong  “political”  spok- 
esperson that  convincingly  argues  his  approach.  And,  there  is  the  ever  present  history  mechanism.  Look- 
ing back  over  several  generations  of  switches,  it  is  seen  that  “new”  architectures  appear  only  about  every 
15  years. 

6.  Development 

Development’s  two  major  products  are  the  Feature  Specifications  and  the  Development  Code.  The 
Feature  Specifications  are  the  exact  specification  of  how  each  feature  will  act  including  error  conditions. 
The  Development  Code  is  the  code  expected  to  be  released  that  must  pass  the  various  stages  of  integration 
and  testing.  This  section  looks  at  the  developer’s  position  in  the  life  cycle  in  terms  of  both  new  develop- 
ment and  old  development  (fixing  bugs). 

6.1.  New 

The  developer  is  basically  at  the  bottom  of  the  development  structure  looking  up  to  see  the  entire 
project.  In  the  beginning  of  a project,  it  has  been  beneficial  to  have  other  groups  such  as  System  Test 
review  the  various  specification  and  design  documents.  Near  the  end  of  the  development,  it  has  been 
beneficial  to  have  an  “in  house”  system  to  use  as  a test.  (This  system  is  not  in  a testing  lab,  but  rather 
the  phone  system  used  in  day  to  day  work.)  Often,  however,  the  developer  can  not  get  a big  picture  of  the 
system  he  is  working  on.  In  other  words,  it  can  be  very  difficult  to  understand  interactions  and  dependen- 
cies between  his  part  of  the  project  and  the  rest  of  the  project.  A system  that  allows  the  developer  to  see 
these  dependencies  would  be  very  welcome. 

The  environment  that  the  developer  works  in  is  currently  a collection  of  tools  built  on  top  of  other 
tools  to  force  them  to  work  together.  For  instance,  the  developer  uses  a combination  of  tools  from  the 
Local  Administrative  Tool  Kit  (LATK)  and  the  Object  Generation  System  (OGS)  to  create  workspaces  in 
which  to  work.  OGS  is  in  turn  built  on  top  of  MESA  which  is  built  on  top  of  SCCS.  While  most  of  these 
things  work  pretty  well,  there  are  some  failings  especially  in  maintaining  dependencies  between  modules 
that  cause  problems  with  building  the  system.  In  addition,  there  is  very  little  connection  between  develop- 
ment code,  the  Modification  Request  (MR)  system,  and  the  Project  Document  (PD)  system.  This  lack  of 
connection  results  in  much  time  spent  tracking  down  the  correct  person  to  talk  to  when  a problem  occurs. 
Or,  in  the  case  of  a bug  report  (MR),  there  is  no  automatic  suggestion  to  other  releases  that  this  bug  may 
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also  be  present.  This  last  problem  is  most  important  when  multiple  versions  of  a system  are  being  main- 
tained, possibly  during  beta-testing  or  controlled  introduction. 

Other  problems  that  occur  while  a developer  is  coding  a new  system  include: 

1.  Large  amounts  of  code  make  it  difficult  to  find  definitions  and  other  things, 

2.  Debugging  a large  (real  time)  system  can  be  very  difficult, 

3.  Edit-build-test-debug  cycle  can  take  a very  long  time  - especially  when  using  multiple  machines, 

4.  communication  - finding  the  right  people  and  information  quickly  and  easily. 

Some  (partial)  solutions  to  these  problems  include: 

1.  vi  with  tags  - a version  of  vi  that  uses  tags  generated  by  the  compiler  and  other  utilities  that 
allows  vi  to  be  used  as  a browser.  A few  key  strokes  finds  the  definition  of  a define,  variable,  or 
function  and  puts  it  in  a separate  window. 

2.  A complicated  debugging  simulator  that  runs  on  the  development  machines  rather  than  the  target 
machines  makes  debugging  much  easier. 

There  are  also  some  tools  that  index  the  error  messages  so  that  one  can  find  where  and  error  mes- 
sage was  generated.  But  these  are  not  fully  developed. 

3.  One  reason  for  the  length  of  the  edit-build-test-debug  cycle  is  building  with  cross-compiling  and 
down  loading.  Reliable  high  speed  communications  can  alleviate  some  of  the  tedium  and  frustra- 
tion. 

4.  For  communication,  enhancements  to  mail  and  a bulletin  board  system  have  worked  well.  An 
electronic  calendar  system  to  help  schedule  appointments  would  be  helpful.  Also  possibly,  a voice 
storage  phone  system  (i.e.  a centralized  phone  answering  machine)  would  be  helpful. 


6.2.  Old 

Old  refers  to  maintaining  the  development  of  many  releases  simultaneously  including  bug  fixes.  One 
of  the  biggest  problems  here  is  that  of  a bug  fix  being  needed  in  releases  both  older  and  newer  than  the 
most  current  one.  Currently,  Modification  Requests  are  for  one  person  on  one  release.  If  he  realizes  that 
the  problem  may  be  more  wide  spread,  then  he  may  be  able  to  search  out  the  person  responsible  in  a simi- 
lar area  on  the  other  release(s)  and  notify  him.  Often  this  is  not  possible.  The  rest  of  this  section  details 
this  problem  and  some  solution  ideas. 

In  general,  bugs  are  handled  in  using  the  MR  mechanism  in  the  following  manner. 


1.  an  MR  is  received  from  customer,  developer,  or  whoever, 

2.  it  is  assigned  to  someone  to  investigate  (and  plan  a fix), 

3.  it  is  fixed, 

4.  the  fix  is  sent  to  integration  for  inclusion  in  the  next  system  build, 

5.  System  Test  then  verifies  by  testing  that  the  problem  is  fixed. 

This  process  depends  on  some  tools  in  LATK  linking  MR,  MESA  and  other  tools  together.  At  AT&T  the 
Integration  group  overseas  and  reloves  the  integration  problems  with  bugs  and  keeps  track  of  what  bugs 
are  fixed  in  which  versions  and  releases.  This  seems  to  work  and  a few  tasks  such  as  some  of  the  MR 
status  changes  are  done  by  the  LATK  tools.  But,  each  Integrator  is  concerned  only  with  a specific  release 
and  does  not  necessarily  know  about  the  others.  A lot  of  time  is  spent  taking  care  of  these  bug  reports  and 
even  more  is  consumed  by  migrating  them  to  other  versions  and  releases. 

The  process  of  migrating  a bug  fix  is  called  a “bugout.”  The  major  problem  areas  with  bugouts  are: 

1.  different  releases  may  have  different  structure, 

2.  how  much  testing  is  necessary  to  ensure  fix, 

3.  which  changes  be  broadcast  to  other  releases  or  versions. 
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Figure  6 shows  a couple  releases  each  with  a few  versions.  It  also  includes  examples  of  bugout  paths  and 
feature  porting  which  can  often  happen  as  well. 

The  problem  of  different  releases  having  different  structure  refers  to  some  software  architecture 
changes.  These  changes  are  small  in  that  they  may  not  affect  the  system  architecture,  but  source  code  diffs 
will  become  unusable  because  the  context  of  the  diffs  is  preserved  across  releases  (or  versions).  Figure  7 
shows  how  a single  process  may  be  broken  into  smaller  ones  or  even  recombined  into  one  again  in  later 
versions.  In  some  straight  forward  cases  diffs  are  used  resulting  in  a saving  of  human  labor,  but  most 
often  some  data  structure  or  internal  function  has  changed  forcing  the  re-invention  of  the  fix. 

When  testing  a fix,  there  is  the  question  of  how  much  testing  is  necessary.  This  question  arises  a 
second  time  when  a fix  is  ported.  If  the  fix  was  tested  and  verified  in  another  release,  then  it  probably  does 
not  need  as  much  testing  when  it  is  migrated,  or  does  it?  In  practice,  it  is  possible  to  eliminate  some  test- 
ing, but  interactions  of  the  fix  must  be  carefully  examined  first. 

Another  facet  of  different  releases  having  different  structure  is  the  difference  in  features  between  two 
releases.  In  this  case  a table  (e.g.  Figure  8)  of  features  and  releases  may  be  helpful  in  suggesting  what 
releases  need  to  have  a bugout,  especially  when  the  bugout  is  localized  to  a specific  feature.  Perhaps  a field 
could  also  be  added  to  the  MR  to  indicate  what  features  and  what  versions  a particular  bug  would  apply 


j J feature  — 

:>-  port  path 

O bug 

■>  bugout  path 

Figure  6.  Bugout  Paths 


Draft 


October  6,  1986 


13 


Figure  7.  Architectural  Changes 
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Figure  8.  Bugout  Table 


to.  Then,  MRs  could  be  sent  automatically  to  the  appropriate  Integrators.  How  much  information  the 
system  can  send  and  track  is  difficult  to  determine.  Fixes  and  enhancements  can  require  a lot  of  code  and 
other  work,  even  if  it  is  expressible  just  as  dififs  to  documents  and  code. 

Currently,  AT&T  people  individually  generate  dififs  for  versions  that  are  very  closely  related.  For 
releases  that  are  not  closely  related,  they  rely  on  the  fixer  finding  out  who  to  contact  on  the  other  release(s) 
and  forwarding  the  information  about  the  fix  to  the  other  release(s). 

7*  Tools 

The  Tools  group  interacts  with  most  other  development  groups  to  provide  support  for  them.  Most 
of  the  Tools  group’s  work  is  spent  in  automating  various  parts  of  the  development  including  control  of 
source  code,  keeping  and  tracking  bug  reports,  and  control  of  project  documents.  The  Tools  group  makes 
very  few  decisions  concerning  products,  but  does  figure  very  heavily  in  the  resources  necessary  to  create  the 
product.  Most  of  the  work  of  the  tools  group  is  early  in  a product’s  life  time,  but  tools  do  evolve  to  better 
meet  the  needs  of  developers  during  the  product  life  time.  This  section  is  included  in  the  paper  here 
because  most  of  the  software  tools  developed  to  date  interact  with  Development  and  Integration.  Some 
categories  of  tools  are  those  for  source  code  control,  modification  requests,  project  documents,  and  project 


Draft 


October  6,  1986 


14 


management. 


7.1.  Source  Code  Control 

The  primary  source  code  control  mechanism  is  the  Object  Generation  System  (OGS).  OGS  provides 
the  ability  to  generate  multiple  versions  of  a software  product  from  a single  source  hierarchy.  It  adds 
functionality  on  top  of  MESA  (Management  Environment  for  Software  Administration)  which  is  a facility 
for  maintaining  hierarchical  structures  of  SCCS  (Source  Code  Control  System)  files. 

OGS  uses  a hierarchical  directory  structure  in  the  UNIX  file  system  for  project  management.  The 
top  level  is  the  project  (PJ)  level.  Successively  lower  levels  are  the  system,  process,  and  book  level.  Each 
project  has  a base  project  area  for  source  code  and  each  revision  of  the  project  can  have  its  own  base 
object  area.  This  allows  many  revisions  to  be  kept  simultaneously  while  all  share  the  same  source  code 
area.  OGS  utilities  manage  the  various  object  code  generation  details  by  using  MESA  to  get  the  appropri- 
ate source  code  and  “make”  to  create  the  system  object  code.  The  MESA  system  maintains  a hierarchy  of 
source  code  files  and  their  dependencies  so  that  make  files  can  be  automatically  constructed.  The  handling 
of  dependencies  is  done  in  part  by  inspecting  files  and  also  by  giving  the  utilities  knowledge  about  file 
types  by  using  special  file  suffixes.  (Actual  construction  of  the  make  files  is  done  by  OGS.)  SCCS  main- 
tains multiple  revisions  of  individual  files  using  a forward  delta  storage  scheme. 

A developer  typically  follows  these  steps  when  using  OGS: 

1.  setup  - create  a workspace  of  parallel  source  directories, 

2.  sget  - get  the  individual  source  files  to  be  changed  including  locks  to  prevent  multiple  concurrent 
changes, 

3.  edit  the  source  and  possibly  build  local  copy  to  test  changes, 

4.  usubmit  - submit  the  workspace  to  the  Integrator  for  integration.  This  last  step  requests  and  MR 
number  if  there  is  one,  informs  the  Integrator  that  this  workspace  has  been  submitted,  “hides”  the 
workspace  by  changing  the  ownership  and  permissions  of  its  contents,  and  creates  a special  shell 
script  that  can  be  used  to  unify  it  with  the  rest  of  the  system. 

The  commands  above  are  actually  Local  Administrative  Tool  Kit  (LATK)  coatings  of  the  raw  OGS  com- 
mands. This  was  done  in  order  to  help  link  the  OGS  system  in  with  other  systems  such  as  the  MR  system 
for  Modification  Requests.  It  also  alleviates  the  individual  developer  from  needing  to  know  the  details  of 
more  than  one  code  management  system. 

AT&T  has  several  variations  on  its  source  code  management.  This  is  partly  because  each  project 
tends  to  modify  the  “standard”  tools  to  fit  their  needs.  For  example,  the  MESA  system  has  capabilities 
not  yet  supported  by  OGS.  These  include  Independent  MESA  and  CASSI.  Independent  MESA  is  a simple 
mechanism  by  which  development  may  proceed  on  multiple  machines  simultaneously.  It  also  includes  hav- 
ing more  than  one  user  working  on  the  same  source  file.  However,  in  the  latter  case  the  system  stores  the 
multiple  copies  and  requires  human  intervention  to  produce  a single  copy  that  is  a merge  of  the  multiple 
copies.  CASSI  is  an  attempt  to  link  the  MR  system  (described  below)  and  the  source  code  systems  more 
closely.  Some  parts  of  AT&T  other  than  Middletown  have  been  experimenting  with  it,  but  there  have 
been  some  problems  with  it  that  have  prevented  its  widespread  use. 

7.2.  Modification  Requests 

The  Modification  Request  system  is  essentially  a large  database  system  for  tracking  and  storage  of 
bug  reports  and  enhancement  requests.  It  is  used  throughout  AT&T  as  the  Change  Management  Tracking 
System  (CMTS).  The  PCS/MR  system  at  Middletown  is  an  enhancement  primarily  for  user  friendliness 
including  automatic  notification  via  electronic  mail  of  certain  status  changes  that  occur  to  MRs. 

MRs  include  the  expected  data  about  origin,  products,  severity,  and  current  status.  The  status  of  an 
MR  can  be: 

a)  ui  - under  investigation,  i.e.  brand  new, 
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b)  pa  * pending  approval,  i.e.  a fix  has  been  found  but  not  approved  by  the  MR  review  board, 

c)  bf  - being  fixed,  i.e.  someone  is  actually  trying  to  fix  it, 

d)  dup  - duplicate  of  another  MR, 

e)  def  - deferred,  i.e.  not  important  enough  to  be  considered  at  this  time, 

f)  nc  - no  change,  i.e.  not  a problem  or  not  to  be  implemented  after  some  consideration, 

g)  c - complete,  i.e.  fixed  or  added. 

Of  the  status  possibilities  above,  complete,  no  change,  deferred,  and  duplicate  are  considered  to  be 
resolved.  Deferred  MRs  can  be  “unresolved”  at  a later  time  should  it  be  decided  that  they  should  be 
implemented. 

An  MR  may  also  acquire  “child”  MRs.  These  are  used  to  note  various  resolutions  and  to  note  that 
other  items  (e.g.  documents)  may  need  to  be  updated  as  a result  of  this  MR.  In  the  latter,  child  MRs  have 
the  same  effect  as  parent  MRs  and  require  their  own  status  and  resolution  changes. 

7.3.  Project  Documents 

The  Project  Document  (PD)  system  is  a library  of  project  documents.  It  maintains  control  over  a 
project  documentation  in  a manner  similar  to  using  SCCS.  A project  document  is  given  a mnemonic 
identifier  that  includes  its  producer,  its  type,  and  its  sequence  number.  As  a document  is  revised  the  sys- 
tem assigns  release  numbers  to  it  so  that  people  know  whether  they  have  the  lastest  version  or  not.  Docu- 
ments may  also  have  different  status  codes  depending  on  their  state  of  completion.  These  status  codes 
include  draft,  preliminary,  changed,  final,  and  obsolete.  The  PD  system  allows  all  of  a project’s  docu- 
ments to  be  baselined  and  have  MRs  written  against  them.  It  also  makes  it  easier  to  distribute  project 
documents  to  project  members.  In  fact,  a project  notebook  including  project  documents  describing  project 
procedures  for  documents,  MRs,  reviews,  coding  standards,  and  other  things  is  given  to  every  project 
member. 

7.4.  Management  Tools 

AT&T  has  a few  management  tools,  but  they  are  help  with  only  a small  part  of  the  management 
tasks.  These  tools  include  the  Milestone  Schedule  and  Tracking  System  (MSTS)  and  a program  called 
Timeline  that  runs  on  an  AT&T  pc.  A hand  done  procedure  (tool)  called  Priority  Feedback  System  (PFS) 
is  sometimes  used  by  managers  to  help  with  work  assignments  and  monitoring. 

Currently,  MSTS  allows  schedules  to  be  kept  on-line  on  the  development  computers.  Milestones 
consist  of  the  contractor,  the  producer,  the  consumer,  original  due  date,  current  due  date,  and  previous 
due  date.  MSTS  does  not  have  a visual  representation  other  than  tables  and  it  does  not  have  a good 
mechanism  for  setting  up  and  maintaining  dependencies.  Also,  MSTS  is  used  across  entire  projects  and 
does  not  have  a mechanism  to  view  subset  of  interrelated  milestones.  It  is  very  much  just  a database  or 
record  keeping  system.  MSTS  is  used  by  the  Project  Coordinator  and  can  hold  milestones  for  the  entire 
project. 

Timeline  runs  on  personal  computers  and  has  dependencies,  and  some  critical  path  and  cost  analysis 
capabilities.  It  still  has  some  trouble  handling  everything  for  a large  project,  but  has  proven  useful  for 
individual  managers  to  keep  track  of  their  groups. 

For  milestones,  one  would  like  a system  with  the  PERT/CPM  abilities  of  Timeline,  the  scope  of 
MSTS,  selective  viewing  of  dependencies,  and  automatic  notifications  of  approaching  milestones  to  the  peo- 
ple producing  deliverables  and  to  the  Project  Coordinator.  L.  Beaumont  has  started  some  experimenting 
with  a relational  database  to  see  if  most  of  these  capabilities  can  be  developed  using  various  entities  to 
represent  documents,  milestones,  dependencies,  and  people. 

To  help  manage  people  and  work  assignments,  some  managers  use  the  Priority  Feedback  System. 
This  is  not  an  automated  tool,  but  it  could  be  automated,  at  least  in  part.  It  consists  of  a form  in  which  a 
worker  and  his  manager  order  the  worker’s  tasks  by  priority.  The  worker  and  his  manager  then  meet 
every  so  often  to  review  the  tasks  and  set  personal  milestones  for  the  worker.  This  allows  the  worker’s 
progress  to  be  tracked  in  a more  quantifiable  way.  It  also  keeps  the  manager  more  informed  about  the 
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individual’s  work  load  and  to  whom  new  tasks  might  be  assigned.  This  tool  is  sometimes  helps  with  work 
breakdown  structures. 

8.  Integration  & Quality  Development 

Integration  is  the  process  of  forming  one  system  from  many  developers’  code.  Quality  development 
includes  testing  the  integrated  system  in  a “white  box”  manner  to  maintain  system  stability.  AT&T  has 
found  it  advantageous  to  have  split  these  two  tasks. 

8.1.  Integration 

The  Integrator  is  an  individual  assigned  to  a project  to  do  the  system  Integration.  His  responsibili- 
ties include  collection  of  workspaces,  resolution  of  integration  conflicts,  maintenance  of  each  integration’s 
MR  list,  creation  of  the  new  system,  and  integration  testing  of  the  new  system. 

The  Integrator  begins  by  collecting  workspaces  which  have  been  submitted  by  the  developers.  Col- 
lecting workspaces  includes  checking  to  see  that  all  the  developers  have  submitted  their  work  to  the  system 
and  nagging  those  that  have  not.  After  all  the  workspaces  have  been  acquired,  the  Integrator  can  add  the 
source  to  the  system. 

When  updating  the  system  source,  the  Integrator  must  resolve  any  conflicts.  For  instance,  it  is  possi- 
ble that  more  than  one  MR  required  one  source  file  to  change  in  separate  places.  If  the  changes  are  not 
obviously  independent,  then  the  Integrator  must  have  the  developers  agree  on  some  merging  procedure. 
Another  possibility  is  special  cases  that  the  integration  tools  can  not  handle  automatically,  thereby  requir- 
ing the  Integrator’s  intervention.  He  must  also  make  sure  that  the  status  of  the  MRs  are  updated  and  that 
the  list  of  MRs  included  in  a particular  system  is  updated. 

After  the  system  source  is  updated,  the  new  system  is  made.  Most  of  the  time,  this  runs  without  any 
trouble.  Occasionally,  however,  some  source  code  dependencies  may  be  missed  and  the  Integrator  will 
have  to  fix  them  by  hand.  In  order  to  minimize  this,  a complete  rebuilding  from  scratch  of  a system  is 
done  from  time  to  time  (possibly  once  per  week). 

After  the  object  code  has  been  produced,  the  Integrator  downloads  it  into  the  target  machines  and 
runs  some  simple  tests.  The  download  program  is  also  reasonably  clever  in  that  it  will  only  download 
newly  built  code.  The  tests  that  the  Integrator  runs  are  fairly  straight  forward.  They  just  check  the 
current  system  starts  up  and  appears  to  run  correctly. 

The  Integrator  uses  tools  from  OGS  and  LATK.  In  the  process  described  above,  the  Integrator  must 
set  up  several  environment  variables  that  describe  the  software  base,  the  target  hardware,  and  the  system 
type  (e.g.  development  or  field).  The  are  some  tools  to  help  with  setting  these  parameters  and  examining 
what  OGS  will  do  with  them.  The  Integrator  uses  the  command  “review”  to  help  keep  track  of  the  MRs 
in  each  revision  of  the  system.  “Runinstall”  uses  as  input  the  developers  output  from  “usubmit”  to  install 
the  workspaces  using  MESA.  Finally,  new  object  code  is  produced  using  “runbld”  which  uses  dependency 
information  to  rebuild  those  parts  of  the  system  that  need  it. 

Overall,  the  system  works.  Most  dependencies  are  handled  correctly.  There  is  only  one  collection  of 
source  code  with  separate  object  code  areas  for  each  revision.  Also,  the  tools  have  used  electronic  mail 
very  effectively  as  a means  of  communicating  error  conditions  to  the  Integrator.  However,  the  Integrator 
has  some  problems: 

1.  The  system  only  knows  about  predefined  set  of  file  suffixes  so  not  all  files  are  handled  automati- 
cally, 

2.  Tools  are  so  loosely  coupled  that  there  is  too  much  room  for  mistakes  in  parameters  and  order  of 
execution, 

3.  Aside  from  closing  the  dependency  holes  in  1,  one  would  also  like  dependencies  to  be  more  exact- 
ing (i.e.  there  are  still  some  times  when  the  system  recompiles  more  than  it  needs  to). 
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8.2.  Quality  Development 

It  was  recognized  that  there  were  some  problems  that  fell  through  the  cracks  between  development 
and  system  test  concerning  the  quality  and  stability  of  the  system.  Quality  Development  is  designed  to  seal 
those  cracks  at  the  point  between  integration  and  system  test. 

8.2.1.  Motivation 

The  major  motivation  for  QD  is  that  customers  get  extremely  upset  by  bugs  appearing  where  they 
originally  did  not  exist.  The  appearance  of  bugs  in  places  where  they  previously  did  not  exist  make  the 
system  look  like  it  is  deteriorating  rather  than  improving.  The  desire  to  have  the  system  always  improve 
results  in  some  friction  between  marketing  and  development.  The  marketing  view  being  that  if  the  custo- 
mer does  not  see  a bug,  then  the  system  is  not  broken  and  should  not  be  fixed. 

To  achieve  stability,  it  needs  to  be  built  into  the  development  process.  This  is  not  easy  to  do  because 
of  the  following  problems: 

1.  it  is  difficult  to  determine  and  control  the  stability  of  a system, 

2.  it  is  unknown  how  to  prove  when  a system  is  stable  because  stability  is  not  well  defined, 

3.  stability  is  very  closely  tied  to  reliability  but  the  relationship  is  not  clear, 

4.  designing  stability  in  make  the  front  of  development  process  slower  with  a hoped  for  speed  up  at 

the  end  (e.g.  later  practical  demonstrations  disturbs  management), 

5.  The  tradeoff  between  make  things  work  and  maintaining  the  status  quo  is  not  always  clear. 

Some  of  these  problems  are  touched  upon  in  the  next  section,  but  QD  is  very  new  and  therefore  not  as 
clearly  described  or  defined. 

8.2.2.  Implementation 

QD  tests  the  system  in  a “white  box”  manner.  It  purposefully  attacks  weaknesses  in  the  system  to 
improve  the  robustness  (quality)  of  the  system.  QD  stresses  internal  interfaces  and  support  features  that 
are  not  directly  accessible  to  the  customer.  This  means  that  QD  uses  the  Development  Specifications 
rather  than  the  External  Design  that  System  Test  uses.  In  some  cases  this  is  similar  to  the  classiscal 
view”  of  integration  testing. 

QD  is  very  prominent  in  the  MR  review  process  mediating  concerns  and  disputes  about  system  sta- 
bility. MRs  are  reviewed  by  people  from  many  areas  to  best  determine  their  importance  and  impact,  but 
QD  has  the  final  say  on  which  MRs  are  included  in  a system. 

Although  it  occasionally  appears  that  QD  has  its  fingers  in  everything  from  development  through 
delivery,  its  primary  function  is  to  maintain  system  stability.  System  stability  is  ensuring  that  a system 
does  not  change  too  rapidly  and  that  it  does  not  have  news  bugs  appearing  where  there  were  no  bugs  pre- 
viously. (In  other  words,  QD  makes  sure  that  the  fixes  do  not  break  other  things.)  In  order  to  ensure  sta- 
bility, one  often  builds  and  maintains  separate  releases  for  individual  or  small  groups  of  customers.  This 
can  put  a large  amount  of  stress  on  the  configuration  management  system,  the  MR  system,  and  the 
Integrator. 

During  a product’s  lifetime,  the  emphasis  on  stability  changes.  In  the  beginning,  almost  any  fix  is 
accepted  because  the  functionality  of  the  system  is  still  being  completed.  In  the  middle,  only  bug  fixes  that 
are  believed  or  can  be  proven  not  to  break  anything  else  are  accepted.  Finally,  near  the  product’s  end, 
only  those  fixes  guaranteed  not  to  break  anything  else  are  accepted.  Unfortunately,  much  of  this  guaran- 
tee is  still  based  on  the  intuition  of  Development  and  QD.  Occasionally,  System  Test  will  find  problems. 

9.  System  Test 

System  Test  is  the  group  that  acts  as  the  user’s  advocate  to  ensure  that  the  system  the  developers 
produce  meets  it  specification.  System  Test  tests  the  system  in  a “black  box”  manner  to  keep  the  user  s 
view.  System  Test  involves  both  hardware  and  software. 
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System  Test  begins  with  the  development  of  the  project’s  Technical  Prospectus  and  External  Design 
by  reviewing  them  for  completeness  and  testability.  System  Test  tries  to  keep  Development  honest  by 
simulating  the  user  so  that  these  documents  are  not  ambiguous.  Once  the  TP  is  finished,  System  Test 
begins  its  development  process  with  the  creation  of  the  System  Test  Plan  (STP).  The  STP  includes 
detailed  tests  of  each  feature  under  normal  and  exceptional  conditions  over  a variety  of  system 
configurations.  The  STP  also  includes  a description  of  the  tools,  peripherals,  and  simulations  to  be  used  to 
test  the  system. 

Some  stress  between  System  Test  and  Development  results  from  a couple  inherent  problems.  One 
problem  is  that  System  Test  provides  negative  feedback  (error  reports)  to  Development  that  is  not  always 
appreciated.  In  order  to  reduce  the  stress  that  occurs,  System  Test  interacts  with  Development  to  make 
sure  that  they  are  doing  things  correctly.  One  such  interaction  is  Development’s  review  of  the  System  Test 
.Plan  to  ensure  its  accuracy  and  completeness.  Another  problem  occurs  if  Development  slips  its  schedule. 
In  this  case,  System  Test  is  often  put  under  pressure  to  finish  in  less  than  the  scheduled  amount  of  time  so 
that  the  product’s  delivery  date  does  not  slip. 

Often  as  System  Test  develops  its  tests,  a pre-release  is  received  from  Development  that  allows  ST  to 
test  their  testing  tools  and  generally  see  how  the  system  works.  This  is  especially  important  in  the  set  up 
of  new  hardware.  It  should  also  be  noted  that  System  Test  is  a large  project  and  has  a lot  of  things  to 
manage.  Therefore,  System  Test  has  everything  under  some  form  of  version  control  including  both 
hardware  and  software.  This  enables  System  Test  to  more  effectively  determine  the  location  of  the  bug  by 
component  and  release.  Sometimes  System  Test  also  gets  advice  from  Field  Support  people  (customer 
engineers)  to  improve  its  model  of  the  system’s  expected  users. 

Ideally,  when  it  is  time  for  the  new  system  to  under  go  system  test,  System  Test  receives  a final 
release  of  the  software.  In  practice,  however,  this  is  almost  impossible.  Instead  System  Test  receives  a 
“smear”  of  releases.  There  are  several  problems  with  this  including: 

1.  Not  all  tests  can  be  run  on  time  due  to  missing  functionality  (i.e.  development  is  behind 
schedule), 

2.  Many  (most)  tests  are  run  several  (many)  times  on  each  revised  release  in  the  smear  to  verify  that 
fixes  have  not  broken  things  that  previously  worked. 

One  way  that  AT&T  has  helped  solve  the  regression  testing  and  user  simulation  problems  in  the  observed 
products  was  by  developing  an  automated  testing  system  called  GAMUT  [LaSS]  that  includes  both 
hardware  and  software.  The  GAMUT  system  uses  computers  and  special  hardware  to  automatically  and 
reproducibly  execute  tests  that  simultaneously  simulate  many  users  of  a telephone  switch. 

During  System  Test,  MRs  are  used  to  reports  all  bugs  found.  In  fact,  there  is  a lot  of  MR  “cycling” 
where  Mrs  are  filed  by  System  Test,  reponded  to  by  Development,  and  returned  to  System  Test  for  re- 
testing. In  some  cases  MRs  acquire  “child”  MRs  which  are  used  to  indicate  problems  in  related  sub- 
systems or  areas. 

When  a system  passes  System  Test,  not  only  is  the  tested  code  passed  on,  but  a Factory  Release 
Document  is  also  produced.  It  contains  information  about: 

1.  the  MRs  fixed  and  what  they  affected, 

2.  special  procedures  or  workaround  patches, 

3.  special  installation  instructions,  if  needed, 

4.  compatibility  with  previous  hardware  and  software,  and 

5.  controlled  introduction  history. 

These  items  describe  the  major  changes  that  the  Field  Support  and  manufacturing  people  need  to  know 
about  when  producing  and  servicing  the  new  product. 
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10.  Field  Support 

Field  Support  is  the  section  of  the  development  process  responsible  for  installation  and  maintenance 
of  a system.  Field  support  has  three  substages  called  controlled  introduction,  scheduled  availability,  and 
general  availability.  Field  support  interacts  with  Development,  System  Test,  Manufacturing,  and  Market- 
ing. This  section  will  concentrate  on  the  Controlled  Introduction  stage  because  it  is  most  closely  tied  to 
system  development  and  the  other  (last)  two  stages  are  primarily  expansion  of  production  levels  to  accom- 
modate full  scale  marketing. 

10.1.  Controlled  Introduction 

Controlled  Introduction  is  like  a beta  test  in  the  “classical”  model  of  the  software  life  cycle  and  maps 
into  the  Qualification  part  of  the  AT&T  life  cycle  in  Figure  1.  Controlled  Introduction  includes: 

1.  standard  product  for  customer  environments, 

2.  final  technical  evaluation  of  a new  product, 

3.  evaluation  of  product  preformance, 

4.  customer  acceptance  criteria, 

5.  identify  necessary  improvements,  and 

6.  validate  product  support  processes. 


To  accomplish  these  tasks,  Cl  has  several  subprocesses:  requirements  and  schedules,  customer  selec- 
tion,  implementation  planning,  customer  briefings,  product  order  and  delivery,  field  support,  and  customer 
evaluation. 

Requirements  and  schedules  are  done  in  conjunction  with  Development,  Project  Mangement,  Marketing, 
and  Manufacturing.  It  is  here  that  the  number  and  kinds  of  customer  sites  is  chosen  and  the  schedules  for 
testing  various  product  feature  is  determined. 

Customer  selection  is  done  based  on  the  test  requirements,  customer  needs  and  cooperation  (want  friendly 
customers),  and  geographical  location.  Usually,  controlled  introduction  is  performed  close  to  the  develop- 
ment area  because  it  may  be  necessary  to  get  development  to  examine  and  fix  problems. 

Implementation  planning  is  part  of  the  global  project  management  work  of  developing  Cl  milestones, 
project/customer  commitments,  and  making  sure  that  both  are  met. 

Customer  briefings  are  used  to  form  a partnership  with  the  customer  so  that  Cl  can  be  as  pleasant  and 
experience  as  possible  for  the  customer.  These  briefings  include  discussions  of  the  customers  detailed  needs 
and  schedules  for  when  and  how  the  system  can  be  installed  into  the  customers  environment. 

Product  order  and  delivery  is  all  the  manufacturing  processes  necessary  to  build  the  customer’s  system. 
This  includes  order  processing,  system  customization,  and  quality  control  tests  at  the  manufacturing  site. 
During  Cl,  the  various  processes  used  for  manufacturing  are  tested  and  reviewed. 

Field  support  is  complete  customer  installation  including  all  components,  wiring,  administration,  and 
installation  tests.  It  also  includes  problem  identification,  problem  resolution,  and  customer  traffic  analysis 
and  system  analyses. 

Finally,  customer  evaluation  surveys  and  interviews  the  customers  about  the  systems  features,  operation, 
performance,  and  documentation  including  user  and  administrator  opinions.  Statistical  analysis  of  the 
responsed  is  used  to  understand  the  strengths  and  weaknesses  of  the  system  and  its  support. 

10.2.  Scheduled  and  General  Availability 

Scheduled  availability  is  an  intermediate  time  period  when  Project  Management,  Manufacturing,  and 
National  Product  Scheduling  carefully  monitor  and  expand  production  facilities  to  handle  the  expected 
market  load. 

General  availability  is  the  point  at  which  the  product  is  available  simply  by  ordering  it  and  having  it 
delivered  in  an  expected  time  interval.  The  product,  its  documentation  and  support  processes  are  complete 
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and  most  services  are  provided  following  a standard  procedure. 

10.3.  Tool  Comments 

Field  Support  currently  has  very  few  tools.  There  are  tools  for  factory  orders  and  some  manufactur- 
ing. There  are  also  tools  for  processing  some  of  the  system  performance  statistics.  But,  there  are  few  tools 
for  the  scheduling  and  coordination  of  Field  Support.  Some  of  the  difficulties  are: 

1.  the  large  number  of  people  that  Field  Support  depends  upon, 

2.  these  people  are  scattered  all  over  the  U.S.  and  possibly  the  world, 

3.  there  are  so  many  changes  that  they  occur  almost  continuously. 

In  other  words,  some  large  distributed  system  would  be  necessary  and  it  is  not  easy  to  define  exactly  how  is 
should  interact  with  the  Field  Support  people. 

11.  Summary 

This  paper  has  been  a summary  of  the  AT&T  software  development  life  cycle.  It  is  seen  that  basi- 
cally a “waterfall”  model  is  used.  Each  process  in  the  life  cycle  has  specific  inputs  and  outputs,  usually 
documents  or  code.  The  life  cycle  differs  from  the  waterfall  model  in  some  respects  by  having  separation 
between  specification  of  customer  features  and  internal  architecture.  Many  groups  have  influence  over 
more  than  one  process  in  the  development  of  a process  to  provide  checks  and  balances  which  promote  a 
more  coherent  and  easy  to  understand  product  structure. 

We  have  seen  the  that  the  organizational  structure  of  AT&T  is  primarily  by  project,  except  for  some 
management  and  marketing  type  functions.  It  appears  to  work  fairly  well  in  keeping  the  technical  exper- 
tise focused  on  one  entity  while  allowing  management  and  marketing  enough  dissociation  to  make  fair 
decisions  about  the  direction  and  coordination  of  a project. 

While  touring  the  life  cycle,  we  have  discussed  the  products  of  each  process  and  how  they  are  pro- 
duced. We  have  also  looked  at  some  of  the  tools  in  use  to  help  automate  these  processes.  Even  though  we 
only  saw  brief  overviews,  some  strengths  and  weaknesses  were  described  for  most  of  these  tools.  Also, 
some  suggestions  for  capabilities  of  future  tools  were  mentioned. 

This  observation  has  provided  invaluable  experience  for  research  into  automation  of  software 
development  from  the  development  of  program  fragments  through  global  tracking  and  high  level  manage- 
ment. It  has  given  the  author  much  to  think  about. 
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SECTION  I 


PRODUCT  OVERVIEW  AND  RATIONALE 


INTRODUCTION 


Product  Overview  and  Rationale 

Our  project  planning  software  automates  the  COCOMO  cost 
estimation  model.  The  software  will  also  provide  a project  phase  and 
milestone  analyzer  which  outputs  estimates  of  dates  for  project 
phases  and  due  dates  for  appropriate  milestones. 

The  use  of  this  project  planning  software  will  reduce  the  errors 
which  are  made  when  these  cost  estimates  are  calculated  manually. 
Also,  the  automation  of  the  planning  techniques  will  encourage 
project  managers  to  use  the  techniques. 


SECTION  II 


USING  THE  PROJECT  SCHEDULING  SOFTWARE 


LOGGING  ON 


The  user  should  begin  by  starting  up  the  S9000.  The  machine 
ould  already  be  on  and  all  the  user  needs  to  do  is  turn  the 
brightness  control  knob  to  the  right  if  it  isn't  already  turned  up. 

The  user  -first  must  log  on  to  the  machine.  This  is  done  by 
typing  in  the  login  the  user  has  been  assigned.  Presumably,  this  is 
the  same  as  that  which  has  been  given  to  the  students  o-f  CS327  which 
is  "cs327”.  The  user  should  respond  to  the  login  prompt  with  the 
foil owi ng : 

cs327  <cr>  cr  = carriage  return 

The  user  should  wait  for  the  system  prompt  which  has  the 
following  format  before  typing  anything  else: 

cs327<>:x  >#: 

where  >:>:  is  the  command  line  number. 

The  permissions  for  the  present  directory  should  also  be  set  to 
read/write/execute  for  all  users.  This  is  done  by  first  moving  to 
the  directory  above  the  present  one: 

cd  . 

Then  the  command 

chmod  777  <di rectory  name> 

is  issued.  Finally,  the  user  needs  to  move  back  to  this  directory 
using  the  command 

cd  < directory  name) 
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Before  going  on  the  user  needs  to  load  the  project  scheduling 
software.  The  -following  steps  should  be  -followed: 

1)  Insert  the  disk  containing  the  project  scheduling 
software  into  disk  drive.  The  disk  is  inserted  with  the 
label  facing  left,  the  edge  with  the  write  protect  notch 
going  in  first.  The  notch  will  be  in  the  lower  half. 

2)  Then  the  following  command  should  be  entered: 

tar  xvfn  /dev/rfdA  ps 

where  A is  the  number  of  the  disk  drive  in  which  the 
disk  containing  the  project  scheduling  software  was 
inserted.  The  drive  on  the  left  is  0,  the  other  is 
drive  1. 


Once  the  program  is  loaded  the  user  need  only  type: 


PS 


to  enter  the  project  scheduling  software.  At  this  point  the 
software  will  take  control  and  the  user  will  be  prompted  for 
any  other  input. 

The  user  can  stop  the  software  from  running  at  any  time  by 
hitting  <ctrl>-c.  However,  hitting  <ctrl>— c while  the  software  is 
saving  the  project  values  may  cause  errors  in  the  output  file.  After 
exiting  the  software  the  user  should  save  any  work  done  on  their  disk 
by  following  procedures  listed  under  Saving  Work  and  Logging  Out. 


NORMAL  RUN 


A normal 


1) 

2) 

3) 

4) 
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run  consists  of : 

Logging  On 

Entering  Data 

Reviewing  Results 

Saving  Work  and  Logging  Out 


Logging  on  has  been  previously  described  in  Section  I. 

The  user  should  follow  steps  up  to  and  including  the  invoking  of  the 
sof tware. 


ENTERING  DATA 


The  first  item  the  user  is  prompted  for  is  the  function  which  the 
user  wants  to  perform.  Figure  1 illustrates  the  menu.  For  a normal 
run  the  user  will  choose  option  1 — to  set  up  a new  project.  At  any 
time  the  user  can  hit  the  backspace  key  when  responding  to  a prompt. 


Welcome  to  the  Project  Scheduler 
Select  from: 

Enter  1 to  set  up  a new  project 

Enter  2 to  retrieve  old  project  values  for  manipulation 
Enter  function  choice: 


Figure  1 


Also,  each  response  to  a prompt  should  be  -followed  by  a carriage 
return.  NOTE:  The  system  will  not  continue  until  the  carriage 

return  is  typed. 

Once  this  option  is  chosen,  the  user  will  -first  be  prompted  -for  a 
oject  identifier  as  shown  in  figure  2.  The  project  identifer  may 
consist  of  any  combination  of  characters  up  to  a total  of  9. 


Enter  project  identifier 


Figure  2 


Next  the  user  will  be  prompted  for  the  present  date.  The  user’s 
response  should  be  in  the  form  of  mm/dd/yy.  NOTE:  The  system  will 

not  continue,  until  two  slashes  and  a return  have  been  entered.  On 
the  same  screen,  the  user  will  next  be  prompted  for  the  version 
number  — which  can  be  either  of  type  real  or  integer.  Finally,  on 
the  same  screen  (figure  3),  the  user  will  be  prompted  for  the  project 
start  date.  The  response  again  should  be  in  the  form  of  mm/dd/yy. 
Also,  the  software  will  not  continue  until  two  slashes  and  a carriage 
-turn  have  been  typed  in. 


Enter  date  (mm/dd/yy) : 

Enter  version  number: 

Enter  estimated  project  start  date  (mm/dd/yy): 
Figure  3 


The  system  will  then  prompt  for  the  project  mode  (figure  4).  A 
1,  2 or  3 is  entered  depending  on  the  mode  of  the  project  which  is 
either  organic,  semi-detached  or  embedded,  respectively. 

Next,  the  system  prompts  for  the  estimate  of  KDSI.  The  range  in 
which  the  response  must  fall  is  also  presented  (figure  5).  If  a 


value  is  entered  which  does  not  -fall  within  the  range  specified,  the 
system  will  prompt  again  until  a valid  value  is  entered.  NOTE:  The 

user  may  enter  a number  without  a fractional  part. 


Select  from: 

Range  for  mode  = 2. OK  - 512. OK 

1 = organic  mode 

2 = semi  - detached  mode 

Enter  KDSI  estimate: 

3 = embedded  mode 

Enter  project  mode: 

Figure  4 

Figure  5 

The  last  input  items  which  must  be  entered  are  the  effort 
multipliers.  Each  multiplier  will  be  prompted  for  separately.  An 
example  of  the  format  of  the  screen  for  all  multipliers  is  shown  in 
figure  6.  An  example  of  each  screen  is  shown  in  Section  VIII. 


Enter  Effort  Multipliers 
Product  Attributes 
REQUIRED  SOFTWARE  RELIABILITY 
Range  = 0.75  to  1.4 
Enter  1 for  nominal  value 
RELY: 


Figure  6 


If  an  incorrect  value  is  entered,  the  system  will  print  an  error 
message  and  prompt  again.  NOTE:  If  an  incorrect  value  is  entered 

three  times,  the  user  will  be  exited  from  the  software  to  the 
operating  system  and  all  values  previously  typed  in  will  be  lost. 

After  the  last  effort  multiplier  has  been  entered,  the  system 
will  automatically  begin  calculating  results  and  when  finished  the 
review  results  menu  (figure  7)  will  appear  on  the  screen. 


Enter  1 to  see  input  variables  on  the  screen 

Enter  2 to  see  Basic  Project  Pro-file  on  the  screen 

Enter  3 to  see  Activity  Distribution  by  Phase  on  the  screen 

Enter  4 to  see  Project.  Milestone  Calendar  on  the  screen 

Enter  5 to  save  output  in  report  -format 

Enter  6 to  save  project  values 

Enter  7 to  quit 

Select  from  1-7: 


Figure  7 

REVIEWING  RESULTS 


The  review  results  menu  as  shown  in  -figure  7 allows  the  user  to 
review  the  results  calculated  by  the  project  scheduler,  the  user  need 
only  enter  the  menu  choice  he  desires. 

Gptions  1 - 4 allow  the  user  to  review  the  reports  described  in 
ction  IV  on  the  screen.  The  user  need  only  type  in  the 

desired  option  -followed  by  a return.  When  the  user  is  done  examining 
the  report  he  has  chosen,  he  types  a carriage  return  to  return  to  the 
review  results  menu.  Options  5,  6 and  7 are  described  in  the  next 
section  — Saving  Work  and  Logging  Out. 

SAV ING_WORK_AND_LQGGXNG_Oyj 

Option  5 o-f  the  review  results  menu  should  be  chosen  i -f  the  user 
wants  to  print  the  reports  on  paper  or  save  them  on  his  disk.  This 
option  stores  the  reports  so  that  when  the  user  leaves  the  project 
scheduling  software,  he  can  use  the  lpr  command  to  print  the  output 
file.  The  following  steps  should  be  taken  to  do  this: 


GETTING  PRINTOUT  OF  REPORTS 


1)  Choose  option  5 -From  the  review  results  menu. 

2)  Enter  -filename.  The  system  will  prompt  -for  the  -filename 
and  user  must  respond  to  continue. 

3>  Review  results  menu  will  appear  on  the  screen  again  and 
option  7 should  be  chosen,  i-f  the  user  is  ready  to  end 
this  session. 

4)  At  system  level  type 

ipr  fname 

where  -fname  is  the  -filename  previously  entered  within 
the  project  scheduler. 

Option  6 should  be  chosen  -from  the  review  results  menu,  i-f  the 
user  wants  to  save  the  input  values  for  later  review  and/or  use  for 
running  the  project  scheduler  at  a later  time.  Once  option  a is 
chosen,  the  system  will  prompt  for  the  filename.  After  entering  the 
filename  the  review  results  menu  will  appear  again  on  the  screen. 

If  the  files  created  by  the  project  scheduler  are  to  be  saved  for 
future  use  the  user  must  enter: 

tar  uvf n /dev/rf dA  fname 

where  A is  the  drive  which  the  disk  to  write  to  resides  and  fname  is 
the  file  to  be  saved  on  disk. 

To  logout  the  user  need  only  type 

1 ogout 

NOTE:  Be  sure  to  save  work  on  disk  before  typing  logout  or  all  work 

will  be  lost. 


ERROR  MESSAGES 


The  only  real  error  messages  the  user  will  get  is  while  entering 
input.  The  -following  is  the  type  of  input  and  what  an  error  would 
mean  that  occurred  while  entering  this  type  of  input. 

INPUT  iOBQR.MEANING 

Main  function  menu  choice  Choice  not  equal  to  1 or  2 

Project  mode  Mode  entered  not  equal  to  1,  2 or  3 

KDSI  estimate  Estimate  entered  not  within  range 

specif i ed 

Effort  multipliers  Value  entered  not  within  range 

specif i ed 

If  for  some  reason  another  type  of  error  occurs,  the  user  should 
pe  <ctrl>-c.  This  will  take  the  user  back  to  the  operating  system 
and  the  user  can  type 

PS 

to  begin  the  project  scheduler  again.  Possible  causes  for  error: 

1)  File  containing  old  project  values  is  incomplete. 

2)  An  invalid  response  was  made  to  a prompt. 


B§SPONSE_FORMATS 


The  following  is  a list  of  the  input  for  the  project  scheduler 
d the  expected  format  of  the  user’s  response: 


inpuI 


EXPECTED  FORMAT 


Main  -function  menu  choice 
Project  identifier 
Date  and  start  date 

Version  number 
Project  mode 
KDSI  estimate 
E-f-fort  multipliers 
Fi 1 ename 

Review  results  menu  choice 
Notes  about  responding  to  prompts: 


Integer  1 or  2 

Character  string  o-f  length  < 9 

General  form  = mm/dd/yy 
Integer  followed  by  a slash 
followed  by  another  integer,  slash 
and  a final  integer 

Real  or  integer  number 

Integer  1,  2 or  3 

Integer  or  real  value 

Integer  or  real  value 

Character  string  of  length  < 19 

Integer  1,  2,  3,  4,  5,  6 or  7 


1)  Backspace  may  be  used  when  entering  response. 

2)  A carriage  return  must  follow  every  response. 

3)  When  consecutive  carriage  returns  are  entered,  no  values 
will  be  given  to  these  variables,  so  eventually  the 
program  will  crash. 

4)  If  more  than  one  character  is  typed  for  the  mode  or  a 
menu  choice,  the  program  will  crash. 

If  invalid  data  is  entered  such  as  a character  for  an 
integer,  it  is  converted  to  its  integer  equivalent. 


5) 


REVIEWING  PROJECT  DATA 


Reviewing  project  data  is  very  similar  to  a normal  run  except  tor 
two  things: 

1)  The  user  is  prompted  -for  the  -filename  which  the  project 
values  are  stored. 

2)  The  user  is  prompted  as  to  whether  he  wants  change  the 
listed  value. 

Figure  8 shows  an  example  of  what  the  screen  looks  like  when  the 
option  to  change  is  made  for  the  mode.  NOTE:  To  change,  the  user 

must  enter  a capital  Y at  the  prompt.  Anything  else  typed  in  is 
considered  a no. 


Figure  8 


o 


SECTION  III 


cc 

3 


ERROR 

TERMINATION 


SECTION  IV 


REPORT  FORMATS 


Report  # I 


DESCRIPTIVE  PROJECT  VALUES 


PROJECT  IDENTIFIER:  DATE 

VERSION  NUMBER: 

ESTIMATED  PROJECT  STARTING  DATE: 

ESTIMATED  PROJECT  LENGTH  (WEEKS): 

PROJECT  MODE: 

KOSI  ESTIMATE: 


Product 

Attributes 

RELY: 

DATA : 

CPLX : 


Computer 

Attributes 

TIME: 

STOR: 

VIRT : 

TURN: 


Personnel 

Attributes 

ACAP: 

AEXP : 

PCAP : 

VEXP: 

LEXP: 


Project 

Attributes 

MQDP: 

TOOL: 

SCED: 


MM: 

TDEV : 

PRODUCTIVITY : 

PROJECT  AVERAGE  F5P : 


Report  #2 


BASIC  PROJECT  PROFILE 
QUANTITY  MODE  * 


Total  effort  (MM) 

Plans  and  requirements 
Product  design 
Programming 

Detailed  design 
Code  and  unit  test 
Integration  and  test 
Total  schedule  (months) 

Plans  and  requirements 
Product  design 
Programming 
Integration  and  test 
Average  personnel  (F5P) 

Plans  and  requirements 
Product  design 
Programming 
Integration  and  test 
Productivity  ( 05 I /MM) 

Code  and  unit  test  only  (D5I/MM) 


» 


Report  #3 


ACTIVITY 

DISTRIBUTION  BY 

PHA5E 

Phase 

1 

Activity 

Plans  and 
Requirements 

Product 

Design 

Pr ogranming 

Integration  i 
and  Test 

Percent  F5P 

Percent  FSP 

Percent  FSP 

Percent  FSP\ 

Requirements  Analysis 
Product  Design 
Programing 
Test  Planning 
Verification  and 
Validation 
Project  Office 
CM/QA 
Manuals 


Total 


Report  #4 


PROJECT  IDENTIFIER: 
VERSION  NUMBER: 
DATE: 


PROJECT  MILESTONE  CALENDAR 

ESTIMATED  PROJECT  STARTING  DATE: 
ESTIMATED  PROJECT  LENGTH  (WEEKS): 


REVIEW  . 

PRODUCT  FEASIBILITY  REVIEW  . 
SOFTWARE  REQUIREMENTS  REVIEW 


PRELIMINARY  DESIGN  REVIEW 
CRITICAL  DESIGN  REVIEW 


50URCE  CODE  REVIEW 
ACCEPTANCE  TEST  REVIEW 
PRODUCT  RELEASE  REVIEW 
PROJECT  POST-MORTEM 


WORK  PRODUCTS  REVIEWED 


SYSTEM  DEFINITION 
PROJECT  PLAN 

SOFTWARE  REQUIREMENTS  SPECIFICATION 
PRELIMINARY  USER'S  MANUAL 
PRELIMINARY  VERIFICATION  PLAN 
ARCHITECTURAL  DESIGN  DOCUMENT 
DETAILED  DESIGN  SPECIFICATION 
USER'S  MANUAL 

SOFTWARE  VERIFICATION  PLAN 

WALKTHROUGHS  fi  INSPECTIONS  OF  SOURCE  CODE 

ACCEPTANCE  TEST  PLAN 

ALL  OF  ABOVE  DOCUMENTS 

PROJECT  LEGACY 


SECTION  V 


TECHNICAL  SPECIFICATIONS 


Packaging  Specifications 


The  Project  Scheduling  Software  consists  of  four  files  of  source  code. 

compute. c -This  serves  as  the  main  section  of  code.  It  co-ordinates  the 
calling  of  all  of  the  other  functions.  It  also  contains  the  code  to 
compute  the  MM,  TDEV,  Productivity,  Project  length.  Project  Average  FSP, 
FSP  distribution,  and  Milestones. 

tablemanlp.c  - This  file  of  code  co-ordinates  the  accessing  of  the 
reference  table  values  and  computes  the  Effort,  Schedule,  and  Activity 
distribution  output  values. 

In.c  - This  file  contains  the  Input  and  output  code.  The  Input  code 
co-ordinates  the  Interactive  Input  and  fills  the  globals  Input  program 
values.  It  Is  also  In  this  code  that  all  reading  from  old  project  files 
or  writing  to  files  (for  storage  of  project  values  or  output  reports)  Is 
managed.  The  output  portion  of  this  code  co-ordinates  the  formatting  of 
the  output  reports. 

defs.h  - This  file  contains  the  declarations  for  all  of  the  global 
variables  for  the  software. 


The  Project  Scheduling  Software  Is  compiled  Into  compute. o,  tablemanlp.o, 
and  In.o  using  the  following  command: 
cc  -o  ps  compute. c tablemanlp.c  In.c  -1m 


The  software  Is  then  executed  by  typing:  ps 


PROJECT  SCHEDULING  SOFTWARE  - COMPUTATIONAL  SPECIFICATIONS 


c-  1 


Calculations  used  In  the  Project  Scheduling  software  (PSS): 

The  PSS  Implements  the  Intermediate  C0C0M0  model  as  described  In  Software 
Engineering  Economics  by  Barry  Boehm  (1981).  The  following  Is  a description  of 
the  equations  and  procedures  used. 


1)  Intermediate  C0C0M0  Nominal  Effort  Estimating  Equations: 
Development  Mode  Nominal  Effort  Equation 


Organic  (MM)nom  * 3.2(KDSI)**1.05 
Semi-detached  (MM)nom  * 3.0(KDSI)**1.12 
Embedded  (MM)nom  = 2.8<KDSI)«*1.20 


2)  MM  = MM(nom)  * (product  of  the  15  effort  multipliers)  [rounded] 


3)  Basic  and  Intermediate  COCOMO  Schedule  Estimating  Equations: 
Development  Mode  Schedule  Equation 


Organic  TDEV  = 2.5 (MM) **0.38  [rounded] 
Semi-detached  TDEV  = 2.5(MM)**0.35  [rounded] 
Embedded  TDEV  = 2.5 (MM) **0.32  [rounded] 


4)  Productivity  = (KDSI  * 1000)  / MM  [rounded] 


5)  Project  Average  FSP  = MM  / TDEV 


6)  Estimated  project  length  =(t1me  for  the  + (tdev  * 4.33) 

planning  and  [conversion  of  tdev  to  weeks] 

requirements  phase) 


7)  Milestones:  The  week  number  assignments  for  each  of  the  project  phases  are 

calculated  by:  1)  converting  to  weeks  (value  * 4.33)  the  corresponding 

values  for  the  allotment  of  schedule  time  for  the  phase  In  the  schedule 
distribution  and  2)  creating  a time  line  for  the  project  by  adding  each 
allotment  of  weeks  to  the  sum  of  the  allotments  of  weeks  for  the  previous 
phases.  The  work  products  are  assigned  using  the  table  of  Reviews  and 
Milestones  In  the  Phased  Life-cycle  Model  from  Fairley  (1985)  Software 
Engineering  Concepts  (page  42).  This  table  Is  presented  In  the  appendix. 


8)  Calculation  of  the  Distribution  Outputs: 


8a)  Effort  Distribution:  Each  row  In  the  Effort  Distribution  output  Is 

calculated  by  multiplying  MM  and  the  appropriate  percentage  value  from  the 
Effort  Distribution  Table.  The  appropriate  percentage  value  Is  found  by 
Identifying  the  correct  reference  table  using  the  mode  and  KDSI  combination# 
and  then  selecting  the  percentage  value  In  the  row  that  corresponds  to  the 
activity  value  currently  being  calculated.  If  KDSI  Is  not  an  exact  reference 
table  column  value#  Interpolation  Is  performed  to  determine  the  percentage 
value  to  be  used. 

8b)  Schedule  Distribution:  Each  row  In  the  Schedule  Distribution  output 

Is  calculated  by  multiplying  TDEV  and  the  appropriate  percentage  value  from  the 
Schedule  Distribution  Table.  The  appropriate  percentage  value  Is  found  by 
Identifying  the  correct  reference  table  using  the  mode  and  KDSI  combination# 
and  then  selecting  the  percentage  value  In  the  row  that  corresponds  to  the 
activity  value  currently  being  calculated.  If  KDSI  Is  not  an  exact  reference 
table  column  value#  Interpolation  Is  performed  to  determine  the  percentage 
value  to  be  used. 

8c)  Average  Personnel  (FSP)  Distribution:  Each  row  In  the  Average 

Personnel  (FSP)  Distribution  Is  calculated  by  dividing  the  corresponding  MM 
value  from  the  Effort  Distribution  by  the  corresponding  TDEV  from  the  Schedule 
Distribution. 

8d)  Activity  Distribution  by  Phase:  The  Activity  Distributions  are 

calculated  using  the  Average  Personnel  (FSP)  Distribution.  Each  of  the  four 
values  In  the  Average  Personnel  (FSP)  Distribution  Is  expanded  to  show  the 
breakdown  In  terms  of  percentage  of  personnel  for  that  phase  on  each  of  the 
eight  project  activities  that  occur  In  some  percentage  throughout  the  whole 
project.  Each  Average  Personnel  (FSP)  Distribution  value  Is  multiplied  by  the 
appropriate  percentage  value  from  the  Project  Activity  Distribution  by  Phase 
Tables.  The  appropriate  percentage  value  Is  found  by  Identifying  the  correct 
reference  table  using  the  mode  and  KDSI  combination#  and  then  selecting  the 
percentage  value  In  the  row  that  corresponds  to  the  activity  value  currently 
being  calculated.  If  KDSI  Is  not  an  exact  reference  table  column  value# 
Interpolation  Is  performed  to  determine  the  percentage  value  to  be  used. 


SECTION  VI 


SAMPLE  RUN 


DESCRIPTIVE  PROJECT  VALUES 


PROJECT  IDENTIFIER:  testl 
VERSION  NUMBER:  1 

ESTIMATED  PROJECT  STARTING  DATE:  10/24/86 
ESTIMATED  PROJECT  LENGTH  (WEEKS):  110 
PROJECT  MODE:  3 
KDSI  ESTIMATE:  80.0 


Product 

Computer 

Personnel 

Project 

Attributes 

Attributes 

Attri bu tes 

Attributes 

RELY: 

1.00 

TIME: 

1 . 00 

ACAP: 

1.00 

M0DP : 

1.00 

DATA: 

1.00 

5TOR: 

1.00 

AEXP : 

1.00 

TOOL: 

o 

o 

rH 

CPLX : 

1.00 

VIRT: 

1.00 

PCAP : 

1.00 

5CED: 

1.00 

TURN: 

1.00 

VEXP: 

1.00 

LEXP: 

1.00 

MM: 538.0 
TDEV : 19.0 

PRODUCTIVITY:  149 
PROJECT  AVERAGE  F5P : 28.32 


5/13/86 


BASIC  PROJECT  PROFILE 


QUANTITY 


MODE  = Embedded 


Total  effort  (MM) 

Plans  and  requirements 
Product  design 
Programming 

Detailed  design 
Code  and  unit  test 
Integration  and  test 
Total  schedule  (months) 

Plans  and  requirements 
Product  design 
Programming 
Integration  and  test 
Average  personnel  (F5P) 

Plans  and  requirements 
Product  design 
Programming 
integration  and  test 
V ductivity  (D5I/MM) 

Code  and  unit  test  only  (D5I/MM) 


538.00 

8.00X 
18.00X 
52. SOX 
25.  SOX 
27.00X 
29. SOX 
19.00 
34.00X 
35.00X 
38.00X 
27.00X 
28.32 
23.53X 
51.43X 
138. 16X 
109. 26X 
149 
' 551 


43.04 
96.84 
282 . 45 
137.19 
145.26 
158.71 

6.46 

6.65 

7.22 

5.13 

6.66 

14.56 

39.12 

30.94 


ACTIVITY  DISTRIBUTION  BY  PHASE 


Phase 


Activity 

Plans  and 
Requirements 

Product 

Design 

Programming 

Integration 
and  Test 

Percent 

F5P 

Percent 

F5P 

Percent 

F5P 

Percent 

F5P 

Requirements  Analysis 

© 

o 

in 

3.00 

10.00 

1.46 

3.00 

1.17 

2.00 

0.62 

Product  Design 

14.50 

0 . 97 

42.00 

6.12 

6.00 

2.35 

4.00 

1.24 

Programming 

7.00 

0.47 

12.50 

1.82 

55.00 

21.52 

42.00 

12.99 

Test  Planning 

4.50 

0.30 

6.50 

0.95 

6.50 

2.54 

4.00 

1.24 

Verification  and 
Validation 

8.50 

0.57 

8.50 

1.24 

10.50 

4.11 

24.00 

7.43 

Project  Office 

11.00 

0.73 

10.00 

1.46 

6.50 

2.54 

7.50 

2.32 

CM/QA 

4.00 

0.27 

3.00 

0.44 

7.00 

2.74 

9.00 

2.78 

Manuals 

5.50 

0.37 

7.50 

1 . 09 

5.50 

2.15 

7.50 

2.32 

Total 

100 

6.66 

100 

14.56 

100 

39.12 

100 

30.94 

PROJECT  MILESTONE  CALENDAR 

PROJECT  IDENTIFIER:  testl  ESTIMATED  PROJECT  STARTING  DATE:  10/24/86 

VERSION  NUMBER:  1 ESTIMATED  PROJECT  LENGTH  (WEEKS):  110 

OATE:  5/13/86 


REVIEW 


WORK  PRODUCTS  REVIEWED 


WEEK  # 


PRODUCT  FEASIBILITY  REVIEW 
SOFTWARE  REQUIREMENTS  REVIEW 

PRELIMINARY  DESIGN  REVIEW 
CRITICAL  DESIGN  REVIEW 


SOURCE  CODE  REVIEW 
ACCEPTANCE  TEST  REVIEW 
PRODUCT  RELEASE  REVIEW 
PROJECT  POST-MORTEM 


SYSTEM  DEFINITION  14 

PROJECT  PLAN 

SOFTWARE  REQUIREMENTS  SPECIFICATION  28 

PRELIMINARY  USER'S  MANUAL 
PRELIMINARY  VERIFICATION  PLAN 
ARCHITECTURAL  DESIGN  DOCUMENT  42 

DETAILED  DESIGN  SPECIFICATION  57 

USER'S  MANUAL 

SOFTWARE  VERIFICATION  PLAN 

WALKTHROUGHS  & INSPECTIONS  OF  SOURCE  CODE  73 
ACCEPTANCE  TEST  PLAN  88 

ALL  OF  ABOVE  DOCUMENTS  110 

PROJECT  LEGACY  110 


SECTION  VII 


GLOSSARY 


GLOSSARY 


activity  distribution  by  phase  - break-down  of  project  average  personnel  (PAFSP) 
required  to  perform  each  of  8 activities  during  each  phase  of  the  project 

average  personnel  (FSP)  distribution  - break-down  of  project  average  personnel 
(PAFSP)  into  number  of  personnel  required  to  complete  each  phase  of  the 
project 

cd  - unix  command  to  change  directory 

chmod  - unix  command  to  change  read/write/execute  permissions 

cocomo  - constructive  cost  model  of  software  development  cost  estimation 

descriptive  project  values  - one  of  the  output  reports  produced  by  PSS.  Shows 

all  the  values  input  by  the  user,  and  the  basic  cocomo  values  calculated  by 
the  program:  mm,  tdev,  productivity,  and  project  average  FSP 

effort  distribution  - break-down  of  total  effort  (mm)  into  effort  required  to 
complete  each  phase  of  the  project 

effort  multiplier  - value  within  a given  range  which  represents  the  project’s 
rating  in  terms  of  one  of  15  software  cost  drivers,  such  as  complexity  or 
programmer  ability 

embedded  mode  - software  which  interacts  directly  with  the  hardware;  corresponds 
to  systems  programming 

error  message  - a message  printed  on  the  screen  by  the  PSS  program  indicating 
that  the  user  has  entered  an  incorrect  value 

fsp  - full-time  software  personnel 

kdsi  - thousand  delivered  source  instructions;  the  anticipated  size  of  the 
project  in  terms  of  source  code  instructions 

load  - the  process  of  copying  the  PSS  program  into  the  computer’s  main  memory  in 
preparation  for  running  the  program 

log  on  - the  process  of  identifying  yourself  to  the  computer  in  order  to  be 
admitted  to  the  operating  system 

login  - a password  which  identifies  the  user  and  is  used  to  gain  access  to  the 
operating  system;  the  PSS  user  should  use  the  login  cs327 

logout  - the  command  used  to  exit  from  the  unix  operating  system  after  a PSS 
session,  after  all  work  has  been  saved  on  disk  using  tar  uvfn  command 

lpr  fname  - the  unix  system  command  used  to  obtain  a paper  copy  of  the  PSS 
program  output;  substitute  the  name  of  the  file  in  which  you  saved  the 
output  for  fname 


milestone  - a significant  event  such  as  the  completion  of  a phase  or  an 

important  review  in  the  life  cycle  of  the  software  project;  expressed  as 
the  week  number  of  the  project  by  PSS 

mm  - the  total  amount  of  effort,  in  man-months,  required  to  complete  a software 
project  of  a given  mode,  size  and  description 

mode  - the  general  category  to  which  the  user’s  project  belongs:  1 = organic 

(applications  programs);  2 m semi-detached  (utility  programs);  3 ■ embedded 
(systems  programs) 

organic  mode  - . programs  which  use  an  environment  provided  by  a language 
compiler;  corresponds  to  applications  programming 

pafsp  (project  average  fsp)  - the  average  number  of  full-time  software  personnel 
needed  to  staff  a project  of  a given  mode,  size  and  description 

prod(uctivity)  - the  number  of  delivered  source  instructions  per  man-month  for  a 
project  of  given  mode,  size  and  description 

project  Identifier  - any  combination  of  9 or  fewer  characters  which  the  user 

chooses  and  enters  to  identify  the  project  which  he/she  wishes  to  schedule 

ps  - the  command  used  to  invoke  the  PSS  program  to  start  it  running 

S9000  - the  IBM  computer  system  on  which  the  Project  Scheduler  Software  runs 

schedule  distribution  - break-down  of  elapsed  time  (tdev)  into  time  required  to 
complete  each  phase  of  the  project 

semi-detached  mode  - programs  which  provide  processing  environments  and 
sophisticated  use  of  the  operating  system;  corresponds  to  utility 
programming 

software  cost  driver  - one  of  15  factors  which  strongly  influence  the  cost  in 
terms  of  time  and  money  to  complete  a software  development  project  (see 
Appendix,  Software  Cost  Driver  Rating  Reference  Table  I and  II) 

tar  uvfn  /dev/rfdA  fname(s)  - unix  system  command  to  save  any  files  created  when 
running  the  PSS  program;  substitute  the  correct  drive  number  (0  or  1)  for 
A,  and  the  name(s)  of  your  file(s)  for  fname(s) 

tar  xvfn  /dev/rfdA  ps  - the  unix  command  used  to  tell  the  operating  system  to 

load  the  PSS  program  in  preparation  for  running  the  program;  substitute  the 
correct  drive  number  (0  or  1)  for  A in  the  command 

tdev  - elapsed  time,  in  months,  required  to  complete  a software  project  of  a 
given  mode,  size  and  description 

unix  - the  operating  system  running  on  the  IBM  S9000 

version  number  — a real  or  integer  number  chosen  and  entered  by  the  user  to 
identify  which  version  of  a project  he/she  wishes  to  schedule 
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Description  of  the  mode  project  value.  (Boehm,  1981) 
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Software  Cost  Oriver  Ratings 


Cost  Oriver 


Product  attributes 
RELY 

DATA 

CPLX 

Computer  attributes 
TIME 


Ratings 

Very  Low  Low  Nominal  High 


Effect  slight  in- 
convenience 

See  Table  8-4 


Low.  easily  recov- 
erable losses 
oejages 
Prog.  DSI  10 


Moderate,  recover- 
able losses 

10<5<100 
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loss 

100«;2<  1000 


< 50%  usa  of  avail-  70% 
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Risk  to  human  We 
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Major  change  ev- 
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WWM.  I vimHIV 

TURN 
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ACAP 

ISth  paroanMa* 

OOtt — 
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AEXP 

<4  months  «x- 
pahence 

1 year 

PCAP 

15fh  patcanMa* 

join  percentile 

VEXP 

<1  month  «xpa- 
nonce 

4 months 

LEXP 

<1  month  expe- 
rience 

4 months 

Project  attributes 

MOOP 

No  use 

Beginning  use 

TOOL 

Basic  micropro- 
cessor loots 

Basic  mini  tools 

SCED 

75%  of  nominal 

85% 

time 


<50%  uaaofawatf- 

70% 

85% 

able  storage 

Major  6 months 

Major  2 months 

Major  2 weeks 

Minor  2 weeks 

Minor  2 days 

Average  turnaround 

4-12  hours 

>12  hours 
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cm, - — - 
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12  years 

Wwi  peroentee , 
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Add require- 

tools 

gramming. 

moots,  design. 

test  tools 

management. 

documentation 
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100% 
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160% 

• Taarn  rating  critana:  analy*.  (programming)  ability,  affioancy.  ability  to  commute**  and  cooparals 


Extra  High 
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Software  Cost  Driver  Rating  Reference  Table  I. 

(Boehm,  1981) 


Software  Development  Effort  Multipliers 


Ratings 


Cost  Drivers 

Very 

Low 

Low 

Nominal 

High 

Very 

High 

Extra 

High 

Product  Attributes 

RELY  Required  software  reliability 

.75 

.08 

1.00 

1.15 

1.40 

DATA  Data  base  size 

.94 

1.00 

1.08 

1.16 

CPLX  Product  complexity 

.70 

.85 

1.00 

1.15 

1.30 

1.65 

Computer  Attributes 

TIME  Execution  time  constraint 

1.00 

1.11 

1.30 

1.66 

STOR  Main  storage  constraint 

1.00 

1.06 

1.21 

1.56 

VIRT  Virtual  machine  volatility4 

.87 

1.00 

1.15 

1.30 

TURN  Computer  turnaround  time 

.87 

1.00 

1.07 

1.15 

Personnel  Attributes 

ACAP  Analyst  capability 

1.46 

1.19 

1.00 

.86 

.71 

AEXP  Applications  experience 

1.29 

1.13 

1.00 

.91 

.62 

PCAP  Programmer  capability 

1.42 

1.17 

1.00 

.86 

.70 

VEXP  Virtual  machine  experience4 

1.21 

1.10 

1.00 

.90 

LEXP  Programming  language  experience 

1.14 

1.07 

1.00 

.95 

Project  Attributes 

MOOP  Use  o#  modem  programming  practices 

1-24 

1.10 

1.00 

.91 

.82 

TOOL  Use  of  software  tools 

1.24 

1.10 

1.00 

.91 

.83 

SCED  Required  development  schedule 

1.23 

1.06 

1.00 

1.04 

1.10 

• For  a gwen  software  procfcict  the  txideriying  wrtuai  macfww  it  *w  complex  of  hwdwars  and  software  (OS, 
D6MS,  etc.)  it  cafts  on  to  accomplish  its  tasks. 


Software  Cost  Driver  Rating  Reference  Table  II. 

(Boehm,  1981) 
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Reference  data  used  to  assign  milestones  and  workproducts 

(Fairley,  1985) 


Phase  Distribution  ol  Effort  and  Schedule:  Aft  Modes 


Reference  tables  used  to  calculate  the  Effort  and  Schedule  distributions 

(Boehm,  1981] 


Project  Activity  Distribution  by  Phase:  Organic  Mode 


Phase 

Product  Size 

Overall  Phase  Percentage 

Plans  and 
Requirements 
S 1 M L 
6 

Product  Design 
SIML 
16 

Programming 
SIML 
68  65  62  59 

Integration 
and  Test 
SIML 
16  19  22  25 

Activity  percentage 
Requirements  analysis 

46 

15 

5 

3 

Product  design 

20 

40 

10 

6 

Programming 

3 

14 

58 

34 

Test  planning 

3 

5 

4 

2 

Verification  and  validation 

6 

6 

a 

34 

Project  office 

15 

11 

6 

7 

CM/QA 

2 

2 

6 

7 

Manuals 

5 

7 

5 

7 

Project  Activity  Distribution  by  Phase:  Semidetached  Mode 


Phase 

Product  Size 

Overall  Phase  Percentage 


Plans  and 
Requirements 
S l M L VI 

7 7 7 7 7 


Product  Design 
S I M l VL 

17  17  17  17  17 


Programming 
S I M L VL 

64  61  58  55  52 


Integration 
and  Test 

S l M L VL 

19  22  25  28  31 


Activity  percentage 
Requirements  analysis 

49 

47 

40 

45 

44 

12.5 

12.5  12.5 

12.S  12.5 

4 

4 

4 

4 

4 

2.5 

2.5 

2 5 

2^5 

2 5 

Product  design 

16 

16.5 

17 

17.5 

18 

41 

41 

41 

41 

41 

8 

8 

8 

a 

a 

5 

5 

5 

5 

5 

Programming 

25 

3.5 

4.5 

55 

6.5 

12 

12.5  13 

13.5  14 

56.5  56.5  56.5  56.5  56.5 

33 

35 

37 

39 

41 

Test  planning 

25 

3 

3.5 

4 

45 

4.5 

5 

s.s 

6 

6.5 

4 

4.5 

5 

5.5 

6 

2.5 

2.5 

3 

“1  * 

Venfication  and  validation 

6 

6.5 

7 

7.5 

8 

6 

6.5  7 

7.5  8 

* 7 

7.5 

8 

8.5 

9 

32 

31 

29.5  2B  5 27 

Project  office 

15.5 

14.5 

13.5 

12.5  11.5 

13 

12 

11 

10 

9 

7.5 

7 

6.5 

6 

5.5 

8.5 

a 

7 5 

7 

6 5 

CM/QA 

35 

3 

3 

3 

2.5 

3 

25  2.5 

2.5  2 

7 

6.5 

6.5 

6.5 

6 

8.5 

8 

8 

8 

7 5 

Manuels 

6 

6 

5.5 

5 

5 

8 

8 

7.5 

7 

7 

6 

6 

5.5 

5 

5 

8 

6 

7.5 

7 

7 

Project  Activity  Distribution  by  Phase:  Embedded  Mode 


Phase 

Plans  and 
Requirements 

Product  Design 

Programming 

Integration 
and  Test 

Product  Size 

S 

1 

M 

L 

VL 

S 

1 

M 

L 

VL 

S 

1 

M 

L 

VL 

s 

1 

M 

L 

VL 

Overall  Phase  Percentage 

8 

8 

8 

8 

8 

18 

18 

18 

18 

18 

60 

57 

54 

51 

48 

22 

25 

28 

31 

34 

Activity  percentage 

' 

Requirements  analysis 

50 

48 

46 

44 

42 

10 

10 

10 

10 

10 

3 

3 

3 

3 

3 

2 

2 

2 

2 

2 

Product  design 

12 

13 

14 

15 

16 

42 

42 

42 

42 

42 

6 

6 

6 

6 

6 

4 

4 

4 

4 

4 

Programming 

2 

4 

6 

8 

10 

10 

11 

12 

13 

14 

55 

55 

55 

55 

55 

32 

36 

40 

44 

48 
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2 

3 

4 

5 

6 

4 

5 

6 

7 

8 

4 

5 

6 

7 

8 

3 

3 

4 

4 
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Verification  and  validation 

6 

7 

8 

9 

10 

6 

7 

8 

9 

10 

8 

9 

10 

11 

12 

30 

28 

25 

23 

20 

Project  office 

16 

14 

12 

10 

8 

15 

13 

11 

9 

7 

9 

8 

7 

6 

5 

10 

9 

8 

7 

6 

CM/OA 

5 

4 

4 

4 

3 

4 

3 

3 

3 

2 

8 

7 

7 

7 

6 

10 

9 

9 

9 

8 

Manuals 

7 

7 

6 

5 

5 

9 

9 

8 

7 

7 

7 

7 

6 

5 

5 

9 

9 

8 

7 

7 

Reference  tables  used  to  calculate  the  activity  distributions 

(Boehm,  1981) 
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